Create maps in R in 10 (fairly) easy steps

Use the R programming language to turn location-based data into interactive maps

1 2 3 Page 2
Page 2 of 3

Trust me: You will save yourself a lot of time if you run a few R commands to see whether the nhgeo$NAME vector of county names is the same as the nhdata$County vector of county names.

Do they have the same structure?

str(nhgeo$NAME)
# Factor w/ 1921 levels "Abbeville","Acadia",..: 1470 684 416 1653 138 282 1131 1657 334 791
str(nhdata$County)
# chr [1:11] "Belknap" "Carroll" "Cheshire" "Coos" "Grafton"

Problem number one: The geospatial file lists counties as R factors, while they're plain character text in the data. Change the factors to character strings with:

nhgeo$NAME <- as.character(nhgeo$NAME)

Next, it can be helpful to sort both data sets by county name and then compare.

nhgeo <- nhgeo[order(nhgeo$NAME),]
nhdata <- nhdata[order(nhdata$County),]

Are the two county columns identical now? They should be; let's check:

identical(nhgeo$NAME,nhdata$County )
[1] TRUE

Now we can join the two files. The sp package's merge function is pretty common for this type of task, but I like tmaptool's append_data() because of its intuitive syntax and allowing names of the two join columns to be different.

nhmap <- append_data(nhgeo, nhdata, key.shp = "NAME", key.data="County")

You can see the new data structure with:

str(nhmap)

Step 5: Create a static map

The hard part is done: finding data, getting it into the right format and merging it with geospatial data. Now, creating a simple static map of Sanders' margins by county in number of votes is as easy as:

qtm(nhmap, "SandersMarginVotes")

and mapping margins by percentage:

qtm(nhmap, "SandersMarginPctgPoints")

sandersmarginmerged

We can see that there's some difference between which areas gave Sanders the highest percent win versus which ones were most valuable for largest number-of-votes advantage.

For more control over the map's colors, borders and such, use the tm_shape() function, which uses a ggplot2-like syntax to set fill, border and other attributes:

tm_shape(nhmap) +
tm_fill("SandersMarginVotes", title="Sanders Margin, Total Votes", palette = "PRGn") +
tm_borders(alpha=.5) +
tm_text("NAME", size=0.8)

The first line above sets the geodata file to be mapped, while tm_fill() sets the data column to use for mapping color values. The "PRGn" palette argument is a ColorBrewer palette of purples and greens — if you're not familiar with ColorBrewer, you can see the various palettes available at colorbrewer2.org. Don't like the ColorBrewer choices? You can use built-in R palettes or set your own color HEX values manually instead of using a named ColorBrewer option.

There are also a few built-in tmap themes, such as tm_style_classic:

tm_shape(nhmap) +
  tm_fill("SandersMarginVotes", title="Sanders Margin, Total Votes", palette = "PRGn") +
  tm_borders(alpha=.5) +
  tm_text("NAME", size=0.8) + 
tm_style_classic()

You can save static maps created by tmap by using the save_tmap() function:

nhstaticmap <- tm_shape(nhmap) +
  tm_fill("SandersMarginVotes", title="Sanders Margin, Total Votes", palette = "PRGn") +
  tm_borders(alpha=.5) +
tm_text("NAME", size=0.8)
save_tmap(nhstaticmap, filename="nhdemprimary.jpg")

The filename extension can be .jpg, .svg, .pdf, .png and several others; tmap will then produce the appropriate file, defaulting to the size of your current plotting window. There are also arguments for width, height, dpi and more; run ?("save_tmap") for more info.

If you'd like to learn more about available tmap options, package creator Martijn Tennekes posted a PDF presentation on creating maps with tmap as well as tmap in a nutshell.

Step 6: Create palette and pop-ups for interactive map

The next map we'll create will let users click to see underlying data as well as switch between maps, thanks to RStudio's Leaflet package that gives an R front-end to the open-source JavaScript Leaflet mapping library.

For a Leaflet map, there are two extra things we'll want to create in addition to the data we already have: a color palette and pop-up window contents.

For palette, we specify the data range we're mapping and what kind of color palette we want — both the particular colors and the type of color scale. There are four built-in types:

  • colorNumeric is for a continuous range of colors from low to high, so you might go from a very pale blue all the way to a deep dark blue, with many gradations in between.
  • colorBin maps a set of numerical data to a set of discrete bins, either defined by exact breaks or specific number of bins — things like "low," "medium" and "high".
  • colorQuantile maps numerical data into groups where each group (quantile) has the same number of records — often used for income levels, such as bottom 20%, next-lowest 20% and so on.
  • colorFactor is for non-numerical categories where no numerical value makes sense, such as countries in Europe that are part of the Eurozone and those that aren't.

Create a Leaflet palette with this syntax:

mypalette <- colorFunction(palette = "colors I want", domain = mydataframe$dataColumnToMap)

where colorFunction is one of the four scale types above, such as colorNumeric() or colorFactor and "colors I want" is a vector of colors.

Just to change things up a bit, I'll map where Hillary Clinton was strongest, the inverse of the Sanders maps. To map Clinton's vote percentage, we could use this palette:

clintonPalette <- colorNumeric(palette = "Blues", domain=nhmap$ClintonPct)

where "Blues" is a range of blues from ColorBrewer and domain is the data range of the color scale. This can be the same as the data we're actually plotting but doesn't have to be. colorNumeric means we want a continuous range of colors using numeric variables, not specific categories like candidate names.

We'll also want to add a pop-up window — what good is an interactive map without being able to click or tap and see underlying data?

Aside: For the pop-up window text display, we'll want to turn the decimal numbers for votes such as 0.7865 into percentages like 78.7%. We could do it by writing a short formula, but the scales package has a percent() function to make this easier. Install (if you need to) and load the scales package:

install.packages("scales")
library("scales")

Content for a pop-up window is just a simple combination of HTML and R variables, such as:

nhpopup <- paste0("County: ", nhmap$County,
"Sanders ", percent(nhmap$SandersPct), " - Clinton ", percent(nhmap$ClintonPct))

If you're not familiar with paste0, it's a concatenate function that can join text and variable values into a single character string. I'd rather the county name column be County instead of NAME, so I'll take care of that with dplyr's rename() function: nhmap <- rename(nhmap, County = NAME)

Step 7: Generate an interactive map

Now, the map code:

leaflet(nhmap) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(stroke=FALSE, 
              smoothFactor = 0.2,
              fillOpacity = .8, 
              popup=nhpopup,
              color= ~clintonPalette(nhmap$ClintonPct)
)

Let's go over the code. leaflet(nhmap) creates a leaflet map object and sets nhmap as the data source. addProviderTiles("CartoDB.Positron" ) sets the background map tiles to CartoDB's attractive Positron design. There's a list of free background tiles and what they look like on GitHub if you'd like to choose something else.

The addPolygons() function does the rest — putting the county shapes on the map and coloring them accordingly. stroke=FALSE says no border around the counties, fillOpacity sets the opacity of the colors, popup sets the contents of the popup window and color sets the palette — I’m not sure why the tilde is needed before the palette name, but that's the function format — and what data should be mapped to the color.

Here's the result:

Basic interactive map created in R and RStudio's Leaflet package. Click on an area to see the underlying data.

You may get an error message about an "inconsistent datum" along with your map. Projections are a complicated issue when mapping, but basically, you want your data projection to match that of your the underlying map tiles. This ensures that everything's consistent in terms of the scheme used to represent points on a 3D sphere in two dimensions. To fix this, you can add the projection recommended in the error message with

nhmap_projected <- sf::st_transform(nhmap, "+proj=longlat +datum=WGS84")

and then run the map code above with

leaflet(nhmap_projected) %>%

as the first line.

The Leaflet package has a number of other features we haven't used yet, including adding legends and the ability to turn layers on and off. Both will be very useful when mapping categories with at least three choices, such as the 2016 Republican primary.

Related:
1 2 3 Page 2
Page 2 of 3
It’s time to break the ChatGPT habit
Shop Tech Products at Amazon