Closed Captioning Closed captioning available on our YouTube channel

How to reshape data with tidyr’s new pivot functions

InfoWorld | Apr 5, 2019

See examples of tidyr’s new pivot_longer and pivot_wider functions.

Copyright © 2019 IDG Communications, Inc.

Similar
Hi. I’m Sharon Machlis at IDG Communications, here with Do More With R episode 24: Reshape data with tidyr’s new pivot functions.
I covered tidyr back in episode 12, but some changes are coming. Instead of the old gather() and spread() functions, you’ll be encouraged to use more intuitive pivot_longer() and pivot_wider() functions. Fortunately, tidyr author Hadley Wickham said on Twitter that gather() and spread() won’t be deprecated. So, your older code using them will still work in the future. But gather() and spread() won’t be maintained anymore, and he’ll try to move users to the pivot options.
First just a very brief review of “wide” and “long”.
Here’s an example of a wide data set. It has multiple data points in a single row, and some important information – the time period – in column names instead of the data frame itself.
And here’s what a long, tidy data data set with one measure per row might look like:
How to go from that first wide data set to a long one?
Instead of gather(), the new way will be pivot_longer(). The new function is currently available in the development version of tidyr on GitHub, in the tidyverse/tidyr repository. You can install it with packages like remotes or pacman. Here I used remotes::install_github.
I won’t run the install since it’s already on my system, but I will load the package.
Next I’ll import the wide data from my spreadsheet into R with rio.
Pivot_longer() uses this format
It has 4 arguments: The data frame; cols, which are the columns you want to “pivot” to become one new column; names_to, the name you want for the new category column; and values_to, the name you want for the new value column.
What’s really handy is that the cols argument can use the same type of column-selection syntax as dplyr’s select statement. You can name each column in the argument, but you don’t have to.
Here, I use a usual vector of column names (but without quotation marks around them).
If I don’t specify names for the category and value columns, they’ll default to “name” and “value”
But I could also write this statement using a dplyr-like select for “all columns starting with Q”, or starts_with()
And here’s the same code, but specifying the names for my new columns
To turn long data into more human-readable wide data, use the pivot_wider() function with this format
First argument is your data frame. id_cols are all the columns you don’t want to pivot. That defaults to everything you don’t specify in the other two arguments. names_from is the column you want pivot so each value is a new column; values_from is the column where the values should come from.
Be careful here: You need to explicitly state that the second argument is names_from and the third argument is values_from, and not part of the expected id_cols 2nd argument.
And that’s all there is to it. Thanks for watching! For more R tips, head to the Do More With R page at https go dot infoworld dot com slash more with R, all lowercase except for the R. You can also find the Do More With R playlist on the IDG Tech Talk YouTube channel. Hope to see you next episode!
Featured videos from IDG.tv