My ggplot2 cheat sheet: Search by task

data visualization
Credit: Thinkstock

An easy-to-use guide to dozens of useful ggplot2 R data visualization commands in a handy, searchable table.

There's a reason ggplot2 is one of the most popular add-on packages for R: It's a powerful, flexible and well-thought-out platform to create data visualizations you can customize to your heart's content.

But it also can be a bit overwhelming. While I find the logic of plot layers to be intuitive, some of the syntax can be a bit of a challenge. Unless you do a lot of work in ggplot2, I'm not sure how easy it is to remember that, for example, the simple task of "make my graph title bold" requires the rather wordy theme(plot.title = element_text(face = "bold")).

So I've come up with a two-step method that's drop-dead simple -- at least for me -- to do my most common dataviz tasks in ggplot2. I hope it will help you, too.

Below is a cheat sheet, easily searchable by task, to see just how to do some of favorite and most-used ggplot2 options -- everything from creating basic bar charts and line graphs to customizing colors and automatically adding annotations. If you're still somewhat of a ggplot2 newbie, page 2 of this post has a brief explanation of the ggplot2 layers concept.

Part 2 will make this even easier. I've created RStudio code snippets for several dozen of these tasks, so you don't even have to copy and paste -- or re-type -- these commands. Instead, you can download my ggplot2 code snippets. Find out more about the ggplot2 code snippets and download them to your own system. (Free registration required.)

Cheat sheet for useful ggplot2 tasks

TaskPlot TypeFormatNote
Create basic plot object that will display something Any ggplot(data=mydf, aes(x=myxcolname, y=myycolname)) data=mydf sets the overall source of your data; it must be a data frame. aes(x=colname1, y=colname2) sets which variables are mapped to the x and y axes. A geom layer must be added to this object in order for anything to display, such as + geom_point() or geom_line().
Create basic scatterplot Scatterplot + geom_point() This is added to the basic ggplot object. Need (continuous) numerical data on both axes. aes properties of ggplot you can assign include x data, y data, and mapping color, shape or size to the value of a variable column. To set the specific color of points, use the color property of geom_point, not aes. Aesthetics are mappings.
Set size of points Scatterplot, points on line graph and others + geom_point(size=mynumber) Larger numbers make larger points.
Solve scatterplot issue of too many points exactly on top of each other Scatterplot + geom_point(position = "jitter") Change the amount of jitter with geom_jitter(position = position_jitter(width = mynumber)).
Set shape of points to be all one shape Scatterplot, points on line graph and others + geom_point(shape=mynumber) See chart of available shapes.
Set shape of points based on category Scatterplot, points on line graph and others + geom_point(aes(shape=mycategory)) + scale_shape_manual(values=myshapevector) mycategory needs to be a categorical variable. See chart of available shapes.
Create basic line graph Line graph + geom_line() This is added to the basic ggplot object.
Create line graph with lines of different colors by category Line graph + geom_line(aes(color=mycategory))  
Set color of points or lines to be one color Scatterplot, line graph and others + geom_mychoice(color="mycolor") Unlike with bars, here the color property sets the main color of the item.
Set color of points based on a specific category Any ggplot(mydf, aes(x=myxcolname, y=myycolname, color=mygroupingcol)) + geom_mychoice() Default colors will be selected.
Set color of scatterplot points by numeric data values - define your own palette Scatterplot + geom_point(aes(color=mygroupingvariable)) + scale_color_gradient(low="mylowcolor", high="myhighcolor") Continuous numeric variable needed for grouping-by-color variable when using scale_color_gradient. There are other variations with a midpoint color, specific numbers of colors and more. See docs for scale_color_gradient and scale_fill_gradient.
Set color of scatterplot points by categorical data values - use RColorBrewer Scatterplot + geom_point(aes(color=mygroupingvariable)) + scale_color_brewer(type="seq", palette="mypalettechoice") Color grouping variable needs to be categorical/discrete, not continuous. Type can be sequential or diverging; palettes can be names or numbers. See documentation.
Set type of line Line graph and others with lines + geom_line(linetype="mylinetype") Available line types include solid, dashed, dotted, dotdash, longdash and twodash.
Set width of line Line graph and others with lines + geom_line(size=mysizenumber)  
Set color of line Line graph and others with lines + geom_line(color="mycolor") Color can be a color name available in R like "lightblue" or a hex value like "#0072B2". Run colors() in base R to see all available color names.
Create basic bar graph Bar + geom_bar(stat="identity") This is added to the basic ggplot object. Need categorial data for x axis. stat="identity" uses values in a y column for the y axis. Without this, the graph will show counts of each value on the x axis.
Create basic bar graph with y axis showing count of items in x axis Bar + geom_bar() This is added to the basic ggplot object. Only an x value is needed because this default counts number of records for each x category.
Reorder x axis based on y column values in descending order Bar, boxplots and others ggplot(data = mydf, aes(x=reorder(myxcolname, -myycolname), y=myycolname)) + geom_mychoice() Needs categorical data on x axis and numerical data on y axis. Remove the - before the y column name if you want ascending order. A geom such as geom_bar() or geom_boxplot() must be added.
Create bar graph grouped by category (grouped bar) Bar ggplot(mydf, aes(x=myxcolname, y=myycolname, fill=mygroupcolname))+ geom_bar(stat="identity", position="dodge") Without position="dodge", a stacked barchart is created
Set fill color of bars (or other 2D items in graphs) to be all one specific color Bar, histogram and others + geom_mychoice(fill="mycolor")
for bar graph: + geom_bar(fill="mycolor, stat="identity")
Color can be a color name available in R like "lightblue" or a hex value like "#0072B2". Run colors() in base R to see all available color names. There's a PDF showing R colors here; demo(colors) shows some in your R session.
Set outline color of 2D graph items such as bars Bar, histogram and others + geom_mychoice(color="mycolor") This can be confusing since "color" is not the main item color but its outline. As with fill, the color can be a color name available in R like "lightblue" or a hex value like "#0072B2".
Create a bar graph that will color each bar a different color Bar ggplot(mydf, aes(x=myxcolname, y=myycolname, fill=myxcolname)) + geom_bar(stat="identity")  
Customize colors for bar graph with different color for each bar - define your own palette Bar + scale_fill_manual(values=c("mycolor1", "mycolor2", "mycolor3"))  
Customize colors in a bar graph where colors have been defined to change by a category - use RColorBrewer Bar + scale_fill_brewer(palette="mycolorbrewerpalettename") See available RColorBrewer palettes with display.brewer.all(n=10, exact.n=FALSE). RColorBrewer package must be loaded with library(RColorBrewer).
Create basic histogram Histogram ggplot(data=mydf, aes(x=myxcolname)) + geom_histogram()  
Change bin width of histogram Histogram + geom_histogram(binwidth=mynumber) This sets the width of the bin, not the number of bins.
Set color of histogram bars to one color Histogram + geom_histogram(fill="mycolor")  
Add horizontal line to any type of graph at a specific position Any + geom_hline(yintercept=mynumber) Set color with color argument, width with size arg and type with linetype, such as geom_hline(yintercept=100, color="red", size=2, linetype="dashed").
Add vertical line to any type of graph at a specific position Any + geom_vline(xintercept=mynumber) With categories on x axis, intercept 3 means the 3rd item on the axis. Set color with color arg, width with size arg and type with linetype, such as geom_hline(yintercept=100, color="red", size=2, linetype="dashed").
Add regression line (line of best fit) to scatterplot Scatterplot + stat_smooth(method=lm, level=FALSE) lm stands for linear model. Change default color by adding color property in stat_smooth
Add regression line (line of best fit) with 95% confidence interval to scatterplot Scatterplot + stat_smooth(method=lm, level=0.95) lm stands for linear model.
Use an already-made alternate theme for graph Any + theme_mychoice() Available themes include theme_gray, theme_bw, theme_classic and theme_minimal. If you are customizing a pre-made theme, make sure to add that code after calling the initial theme_mychoice() function.
Add title (headline) Any + ggtitle("My headline text")  
Change headline size Any + theme(plot.title = element_text(size = myinteger)) + theme(plot.title = element_text(size = rel(myinteger))) sets the headline size relative to the plot's base font.
Change headline color Any + theme(plot.title = element_text(color = "mycolor"))  
Make plot headline bold Any + theme(plot.title = element_text(face = "bold")) Also works for face = "italic" or "bold.italic"
Change x-axis title Any + xlab("My x-axis title text")  
Change y-axis title Any + ylab("My y-axis title text")  
Change value labels along the x axis for categorical variables Any + scale_x_discrete(labels=myvectoroflabels)  
Change value labels along the y axis for continuous numerical variable Any + scale_y_continuous(breaks=myvectorofbreaks) scale_x_continuous works similarly for the x axis. A vector of breaks could look something like c(0,25,50,75,100) or seq(0,100,25).
Set y-axis minimum and maximum values Any + ylim(mymin, mymax) xlim works the same for the x axis. If there are values outside your defined limits, they won't display, so you can use this to statically zoom in on a portion of your dataviz.
Rotate x-axis value labels Any + theme(axis.text.x= element_text(angle=myrotationAngle, hjust=myOptionalTweak, vjust=myOptionalTweak2)) rotation angle should be between 1 and 359, such as theme(axis.text.x= element_text(angle=45, hjust=1)). hjust and vjust can be needed to position the text properly with the axis. I often use + theme(axis.text.x= element_text(angle=45, hjust = 1.3, vjust = 1.2)) as settings.
Rotate y-axis title to be horizontal (parallel to x axis) Any + theme(axis.title.y = element_text(angle = 0)) angle can take different values to rotate y-axis text in other ways.
Turn off automatic legend Any + theme(legend.position = "none")  
Change order of legend items Any mydf$mylegendcolumnNew <- factor(mydf$mylegendcolumn, levels=c(myOrderedVectorOfItems), ordered = TRUE) While there are ways to do this in ggplot2, if order matters to you, create a variable ordered as you want in R.
Change legend title font size Any + theme(legend.title = element_text(size=mypointsize))  
Change legend labels size Any + theme(legend.text = element_text(size=mypointsize))  
Create multiple plots based on one or two variables in your data Any + facet_grid(mycolname1 ~ mycolname2) Once you've set up an initial plot using one or more variables, this facet_grid "formula" plots a grid of all possible permutations of additional variables mycolname1 by mycolname2, with mycolname1 in the rows and mycolname2 in the columns. Example: You set up a basic plot of online sales transactions by hour of day, and then make a facet_grid of all such transactions subsetted by category of merchandise and whether customers were new or returning. To use facet_grid for only 1 variable, use a dot for the other one, such as facet_grid(. ~ mycolname1 ).
Create multiple plots based on one or two variables in your data Any + facet_wrap(mycolname1 ~ mycolname2, ncol=myinteger) Similar to facet_grid above but you can manually set number of columns or number of rows in your grid with ncol or nrow, and only those permutations with available values will be plotted. + facet_wrap(~ mycolname1) to facet by one variable, then set nrow or ncol.
Put multiple plots from different data on one page - gridExtra package Any grid.arrange(plot1, plot2, plot3..., ncol=mynumberofcolumns) Any number of plots can be entered, separated by a comma. ncol defaults to 1. gridExtra package must be installed and loaded.
Add text annotations to a plot by x,y position on plot Any + annotate("text", x=myxposition, y=myyposition, label="My text") There are other options for annotate besides "text" such as "rect" for rectangle with properties xmin, xmax, ymin, ymax and alpha (transparency) and optional color (border) and fill (fill color).
Create and auto annotate scatterplot grouped by color - directlabels package Scatterplot myplot <- ggplot(mydf, aes(x=myxcolname, y=myycolname, color=mygroupingcol)) + geom_point()
direct.label(myplot, "smart.grid")
directlabels package must be installed and loaded.
Create and auto annotate line graph where lines are different colors by category Line graph myplot <- ggplot(mydf, aes(x=myxcolname, y=myycolname, color=mygroupingcol)) + geom_line()
direct.label(myplot, list(last.points, hjust = 0.7, vjust = 1))
directlabels package must be installed and loaded. first.points is another option to label at start of line instead of end.
Save plot Any ggsave(filename="myname.ext") ggsave defaults to the most recent plot, but you can set a different plot with ggsave(filename="myname.ext", plot=myplot). File extension determines type of file created -- .pdf, .png and so on. Set width and height in inches with width and height arguments.
1 2 Page 1
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.