Beginner's guide to R: Painless data visualization

Part 4 of our hands-on guide covers simple graphics, bar graphs and more complex charts.

1 2 3 4 5 6 7 8 Page 3
Page 3 of 8

The code structure for a basic graph with ggplot() is a bit more complicated than in either plot() or qplot(); it goes as follows:

ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point()

The first argument in the ggplot() function, mtcars, is fairly easy to understand -- that's the data set you're plotting. But what's with "aes()" and "geom_point()"?

"aes" stands for aesthetics -- what are considered visual properties of the graph. Those are things like position in space, color and shape.

"geom" is the graphing geometry you're using, such as lines, bars or the shapes of your points.

Now if "line" and "bar" also seem like aesthetic properties to you, similar to shape, well, you can either accept that's how it works or do some deep reading into the fundamentals behind the Grammar of Graphics. (Personally, I just take Wickham's word for it.)

Want a line graph instead? Simply swap out geom_point() and replace it with geom_line() , as in this example that plots temperature vs pressure in R's sample pressure data set:

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()

   Line graph
Creating a line graph with ggplot2.

It may be a little confusing here since both the data set and one of its columns are called the same thing: pressure. That first "pressure" represents the name of the data frame; the second, "y=pressure," represents the column named pressure.

In these examples, I set only x and y aesthetics. But there are lots more aesthetics we could add, such as color, axes and more.

You can also use the ylim argument with ggplot to change where the y axis starts. If mydata is the name of your data frame, xcol is the name of the column you want on the x axis and ycol is the name of the column you want on the y axis, use the ylim argument like this:

ggplot(mydata, aes(x=xcol, y=ycol), ylim=0) + geom_line()

Perhaps you'd like both lines and points on that temperature vs. pressure graph?

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line() + geom_point()

The point here (pun sort of intended) is that you can start off with a simple graphic and then add all sorts of customizations: Set the size, shape and color of the points, plot multiple lines with different colors, add labels and a ton more. See Bar and line graphs (ggplot2) for a few examples, or the The R Graphics Cookbook by Winston Chang for many more.

Bar graphs

To make a bar graph from the sample BOD data frame included with R, the basic R function is barplot(). So, to plot the demand column from the BOD data set on a bar graph, you can use the command:


1 2 3 4 5 6 7 8 Page 3
Page 3 of 8
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon