My ggplot2 cheat sheet: Search by task

An easy-to-use guide to dozens of useful ggplot2 R data visualization commands in a handy, searchable table.

ggplot2 101

There's a whole visualization philosophy behind ggplot2 called the "Grammar of Graphics" (that's where the gg in ggplot2 comes from); but here let's focus just on what code you need to build a few basic visualizations layer by layer.

Layer 1 defines which variables are going to do what. And that's all. It's mapping things like what data frame holds your data and which column will be on your x and y axes.

Here's your layer 1 key: When you use a property like color or size as an "aesthetic property" (aes) in this first layer, you are not setting a specific color or a specific size. You are saying something like "I want the color of my points to change based on the values of this column" and NOT "Make the colors of my points the specific color light blue." Picking your color(s) comes later.

Here's what a first layer might look like:

myplot <- ggplot(mydf, aes(x="colname1", y="colname2", color="colname3")

That says: Create a plot using data in mydf and use the following "aesthetics": Set the x axis to colname1 values in mydf, set the y axis to colname2 values in mydf and use different colors depending on the values in mydf colname3.

What layer 1 doesn't do is say what kind of visualization you want: scatterplot, bar graph, histogram, etc. For that you need layer 2: a geometry, or geom in gpplot-speak. You need these first two layers before R will actually show anything (you can add lots more layers for customizing the graph, but two is your minimum). You add a layer with, intuitively, the + symbol. Since we already stored layer 1 in myplot, we can add layer 2 for a scatterplot with:

myplot <- myplot + geom_point()

Make sure your plus sign is on the same line as the new layer. Your layers can either be all on one line, or code after the plus can be on a new line, such as:

ggplot(mydf, aes(x="colname1")) +
geom_histogram()

Don't put the plus sign and new layer on a new line like this

ggplot(mydf, aes(x="colname1"))
+ geom_histogram()

because R will think that first line is complete and not understand that line 2 goes with line 1.

There are a whole host of customizations you can do beyond this, some of which are in the chart on page 1. For more on ggplot2 layers, see ggplot2 creator Hadley Wickham's Build a plot layer by layer.

Want to learn about formatting your data for analysis in R? Download our free PDF, Data wrangling with R.

1 2 Page 2
Page 2 of 2
How AI is changing office suites
  
Shop Tech Products at Amazon