Web scraping with R and rvest (includes video & code)

Sometimes data you want is available on a Web page, but not in form you can easily download. That's where Web-scraping comes in. Most general-purpose computer languages have a library for easily collecting data from an HTML page. R does too -- a new package called rvest by Hadley Wickham, modeled after Python's Beautiful Soup.

Watch how easy it is to import data from a Web page into R. Code from the video is below.

Note: If you don't have rvest installed on your system, you can download and install it with install.packages("rvest"). Get SelectorGadget at SelectorGadget.com.

Note that CSS can change on Web pages -- in fact, the best CSS for the National Weather Service forecast has already changed in the few weeks since I recorded this video. Another good reason to use SelectorGadget, which makes it easy to find the CSS pattern  you want.


library("rvest")
htmlpage <- html("http://forecast.weather.gov/MapClick.php?lat=42.31674913306716&lon=-71.42487878862437&site=all&smap=1#.VRsEpZPF84I")
forecasthtml <- html_nodes(htmlpage, "#detailed-forecast-body b , .forecast-text")
forecast <- html_text(forecasthtml)
paste(forecast, collapse =" ")
  

To learn more about R, see our free Beginner's Guide to R PDF download For more R screencasts, see the rest of my R in 5 Lines or Less series.

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.