Useful new R packages for data visualization and analysis

Mapping Starbucks in R

Map generated with new Leaflet package for R

The following is from a hands-on session I led at the recent Computer Assisted Reporting conference.

There's a lot of activity going on with R packages now because of a new R development package called htmlwidgets, making it easy for people to write R wrappers for existing JavaScript libraries.

The first html-widget-inspired package I want us to demo is

Leaflet for R.

If you’re not familiar with Leaflet, it’s a JavaScript mapping package. To install it, you need to use the devtools package and get it from GitHub (if you don't already have devtools installed on your system, download and install it with install.packages("devtools").


Load the library


Step 1: Create a basic map object and add tiles

mymap <- leaflet()
mymap <- addTiles(mymap)

View the empty map by typing the object name:


Step 2: Set where you want the map to be centered and its zoom level

mymap <- setView(mymap, -84.3847, 33.7613, zoom = 17)

Add a pop-up

addPopups(-84.3847, 33.7616, 'Data journalists at work, <b>NICAR 2015</b>')

And now I’d like to introduce you to a somewhat new chaining function in R: %>%

This takes the results of one function and sends it to the next one, so you don’t have to keep repeating the variable name you’re storing things, similar to the one-character Unix pipe command. We could compact the code above to:

mymap <- leaflet() %>% 
  addTiles() %>%
  setView(-84.3847, 33.7613, zoom = 17) %>%
  addPopups(-84.3847, 33.7616, 'Data journalists at work, <b>NICAR 2015</b>')

View the finished product:


Or if you didn’t want to store the results in a variable for now but just work interactively:

leaflet() %>% 
  addTiles() %>%
  setView(-84.3847, 33.7613, zoom = 16) %>%
  addPopups(-84.3847, 33.7616, 'Data journalists at work, <b>NICAR 2015</b>')

Now let’s do something a little more interesting - map nearby Starbucks locations. Load the starbucks.csv data set – See data source at:

Data files for these exercises are available on my NICAR15data repository on GitHub. You can also download the Starbucks data file directly from Socrata's OpenData site in R with the code

download.file("", destfile="starbucks.csv", method="curl")
starbucks <- read.csv("", stringsAsFactors = FALSE)
atlanta <- subset(starbucks, City == "Atlanta" & State == "GA")
leaflet() %>% addTiles() %>% setView(-84.3847, 33.7613, zoom = 16) %>%
  addMarkers(data = atlanta, lat = ~ Latitude, lng = ~ Longitude,popup = atlanta$Name) %>%
  addPopups(-84.3847, 33.7616, 'Data journalists at work, <b>NICAR 2015</b>')

A script created by a TCU prof lets you create choropleth maps of World Bank data with a single line of code! More info here:

We don’t have time to do more advanced work with Leaflet, but you can do considerably more sophisticated GIS work with Leaflet and R. More on that at the end of this post.

More info on the Leaflet project page

A little more fun with Starbucks data: How many people are there per Starbucks in each state? Let’s load in a file of state populations

statepops <- read.csv("acs2013_1yr_statepop.csv", stringsAsFactors = FALSE)
# A little glimpse at the dplyr library; lots more on that soon

There's a very easy way to count Starbucks by state with dplyr’s count function format: count(mydataframe, mycolumnname)

starbucks_by_state <- count(starbucks, State)

We'll need to add state population here. You can do that with base R’s merge or dplyr’s left_join. left_join is faster but I find merge more intuitive

starbucks_by_state <- merge(starbucks_by_state, statepops, all.x = TRUE, by.x="State", by.y="State") # No need to do by.x and by.y if columns have the same name

# better names

names(starbucks_by_state) <- c("State", "NumberStarbucks", "StatePopulation")

Add new column to starbucks_by_state with dplyr mutate function, which just means alter the data frame by adding one or more columns. Then we’ll store in a new dataframe, starbucks_data, so as not to much with the original.

starbucks_data <- starbucks_by_state %>%

    PeoplePerStarbucks = round(StatePopulation / NumberStarbucks)
    ) %>%
  select(State, NumberStarbucks, PeoplePerStarbucks) %>%

Again the %>% character, so we don’t have to keep writing things like

starbucks_data <- mutate(starbucks_by_state, PeoplePerStarbucks = round(StatePopulation / NumberStarbucks))
starbucks_data <- select(starbucks_data, State, NumberStarbucks, PeoplePerStarbucks)
starbucks_data <- arrange(starbucks_data, desc(PeoplePerStarbucks))

Can we pretend for a moment that doing a histogram of this data is meaningful :-)? Because I want to show you a cool new histogram tool in Hadley Wickham’s ggvis package, still under development:

starbucks_data %>%
  ggvis(x = ~PeoplePerStarbucks, fill := "gray") %>%

Not a big deal? How about one with interactive sliders?

starbucks_data %>%
  ggvis(x = ~PeoplePerStarbucks, fill := "gray") %>%
  layer_histograms(width =  input_slider(1000, 20000, step = 1000, label = "width")) 

# Can even add a rollover tooltip

starbucks_data %>%
  ggvis(x = ~PeoplePerStarbucks, fill := "gray") %>%
  layer_histograms(width =  input_slider(1000, 20000, step = 1000, label = "width")) %>%
  add_tooltip(function(df) (df$stack_upr_ - df$stack_lwr_))

Time Series Graphing

Load needed libraries: dygraphs and xts if not on your system yet, first install with


To begin, let’s run some demo code with a sample data set already included with R, monthly male and female deaths from lung diseases in the UK from 1974 to 1979 datasets are mdeaths and fdeaths

First we’ll create a single object from the two of them with the cbind() – combine by column – function.

lungDeaths <- cbind(mdeaths, fdeaths)

# And now here's how easy it is to create an interactive multi-series graph:


The most “complicated” thing about dygraphs is that it is specifically for time series graphing and requires a time series object. You can create one with the base R ts() function

ts(datafactor, frequency of measurements per year, starting date as c(year, month))

Read in a data file on Atlanta unemployment rates

atl_un <- read.csv("FRED-ATLA-unemployment.csv", stringsAsFactors = FALSE)
# now we need to convert this into a time series
atl_ts <- ts(data=atl_un$Unemployment, frequency = 12, start = c(1990, 1))
dygraph((atl_ts), main="Monthly Atlanta Unemployment Rate")

More info on dygraphs:

Note: There is an existing package called quantmod that will pull a lot of financial and economic data for you and put it into xts format. It pulls data from the Federal Reserve of St. Louis. I searched on their website and found out that the URL for Atlanta unemployment was

which means the code is ATLA013URN


This command

getSymbols("ATLA013URN", src="FRED")

automatically pulls the data into R in the right time-series format, storing it in a variable the same name as the symbol, in this case ATLA013URN. Then we can use dygraph:

dygraph(ATLA013URN, main="Atlanta Unemployment")

To change name of data column in the ATLA013URN time series:

names(ATLA013URN) <- "rate"

Now re-graph:

dygraph(ATLA013URN, main="Atlanta Unemployment")

Aside: Quantmod has its own data visualization if you’re just exploring:

chartSeries(ATLA013URN, subset="last 10 years")

Another very new package lets you do more exploring of FRED data from the Federal Reserve of St. Louis; you need a free API key from the Federal Reserve site. More info here

There’s another new package rbokeh, implementing an R version of the Python bokeh interactive Web plotting library. I’m going to skip this one since so much else to go over, but wanted you to know about it. It’s still under development already well documented at

One other htmlwidgets-inspired package:

1 2 Page 1
The march toward exascale computers
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies