Web scraping with R and rvest (includes video & code)

Sometimes data you want is available on a Web page, but not in form you can easily download. That's where Web-scraping comes in. Most general-purpose computer languages have a library for easily collecting data from an HTML page. R does too -- a new package called rvest by Hadley Wickham, modeled after Python's Beautiful Soup.

Watch how easy it is to import data from a Web page into R. Code from the video is below.

Note: If you don't have rvest installed on your system, you can download and install it with install.packages("rvest"). Get SelectorGadget at SelectorGadget.com.

Note that CSS can change on Web pages -- in fact, the best CSS for the National Weather Service forecast has already changed in the few weeks since I recorded this video. Another good reason to use SelectorGadget, which makes it easy to find the CSS pattern  you want.


library("rvest")
htmlpage <- html("http://forecast.weather.gov/MapClick.php?lat=42.31674913306716&lon=-71.42487878862437&site=all&smap=1#.VRsEpZPF84I")
forecasthtml <- html_nodes(htmlpage, "#detailed-forecast-body b , .forecast-text")
forecast <- html_text(forecasthtml)
paste(forecast, collapse =" ")
  

To learn more about R, see our free Beginner's Guide to R PDF download For more R screencasts, see the rest of my R in 5 Lines or Less series.

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Fix Windows 10 problems with these free Microsoft tools
Shop Tech Products at Amazon