Skip the navigation

4 data wrangling tasks in R for advanced beginners

November 5, 2013 06:30 AM ET

Bonus special case: Grouping by date range

If you've got a series of dates and associated values, there's an extremely easy way to group them by date range such as week, month, quarter or year: R's cut() function.

Here are some sample data in a vector:

vDates <- as.Date(c("2013-06-01", "2013-07-08", "2013-09-01", "2013-09-15"))

Which creates:

[1] "2013-06-01" "2013-07-08" "2013-09-01" "2013-09-15"

The as.Date() function is important here; otherwise R will view each item as a string object and not a date object.

If you want a second vector that sorts those by month, you can use the cut() function using the basic syntax:

vDates.bymonth <- cut(vDates, breaks = "month")

That produces:

[1] 2013-06-01 2013-07-01 2013-09-01 2013-09-01
Levels: 2013-06-01 2013-07-01 2013-08-01 2013-09-01

It might be easier to see what's happening if we combine these into a data frame:

dfDates <- data.frame(vDates, vDates.bymonth)

Which creates:

vDates vDates.bymonth
1 2013-06-01 2013-06-01
2 2013-07-08 2013-07-01
3 2013-09-01 2013-09-01
4 2013-09-15 2013-09-01

The new column gives the starting date for each month, making it easy to then slice by month.

Ph.D. student Mollie Taylor's blog post Plot Weekly or Monthly Totals in R introduced me to this shortcut, which isn't apparent if you simply read the cut() help file. If you ever work with analyzing and plotting date-based data, this short and extremely useful post is definitely worth a read. Her downloadable code is available as a GitHub gist.

Sorting your results

For a simple sort by one column, you can get the order you want with the order() function, such as:

companyOrder <- order(companiesData$margin)

This tells you how your rows would be reordered, producing a list of line numbers such as:

6 1 9 2 5 3 4 7 8

Chances are, you're not interested in the new order by line number but instead actually want to see the data reordered. You can use that order to reorder rows in your data frame with this code:

companiesOrdered <- companiesData[companyOrder,]

where companyOrder is the order you created earlier. Or, you can do this in a single (but perhaps less human-readable) line of code:

companiesOrdered <- companiesData[order(companiesData$margin),]

If you forget that comma after the new order for your rows you'll get an error, because R needs to know what columns to return. Once again, a comma followed by nothing defaults to "all columns" but you can also specify just certain columns like:

companiesOrdered <- companiesData[order(companiesData$margin),c("fy", "company")]

To sort in descending order, you'd want companyOrder to have a minus sign before the ordering column:

companyOrder <- order(-companiesData$margin)

And then:

companiesOrdered <- companiesData[companyOrder,]

You can put that together in a single statement as:

companiesOrdered <- companiesData[order(-companiesData$margin),]

fy company revenue profit margin
8 2011 Microsoft 69943 23150 33.1
7 2010 Microsoft 62484 18760 30.0
4 2010 Google 29321 8505 29.0
3 2012 Apple 156508 41733 26.7
5 2011 Google 37905 9737 25.7
2 2011 Apple 108249 25922 23.9
9 2012 Microsoft 73723 16978 23.0
1 2010 Apple 65225 14013 21.5
6 2012 Google 50175 10737 21.4

Note how you can see the original row numbers reordered at the far left.

If you'd like to sort one column ascending and another column descending, just put a minus sign before the one that's descending. This is one way to sort this data first by year (ascending) and then by profit margin (descending) to see which company had the top profit margin by year:

companiesData[order(companiesData$fy, -companiesData$margin),]



Our Commenting Policies
Internet of Things: Get the latest!
Internet of Things

Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!