When RStudio Chief Scientist Hadley Wickham announced that well-known stats professor and R package author Jenny Bryan was joining his team, it was interesting news.
When just a few weeks later he said that Max Kuhn at Pfizer was also coming on board, it raised more than a few eyebrows.
Two high-profile hires in three weeks? On top of another addition to his team in October? RStudio is still a somewhat small startup. Is there some sort of major expansion going on there?
I got a chance to ask Hadley Wickham directly this afternoon. His answer: The timing is somewhat of a coincidence. "The plan isn't to add anyone else to my team in the near future," he told me. Wickham's own team there has expanded to five (dozens of others work at the company, too).
Kuhn, formerly senior director of nonclinical statistics at Pfizer, will be working on "what modeling should look like in the 'tidyverse,'" Wickham said. That's something Wickham said he's already been thinking about, "but Max is really going to be thinking about this deeply." Kuhn has authored several modeling packages for R, including caret, and co-authored the book Applied Predictive Modeling.
The "tidyverse," once nicknamed the "Hadleyverse" because it was Wickham who built most of the foundational packages, is a somewhat opinionated ecosystem of dealing with data within R. The structure is designed to help data flow easily among packages. (To oversimplify a bit, "tidy" refers to a data structure where there's one observation per row in a rows-and-columns framework). The tidyverse include popular graphics package ggplot2, data-wrangling favorite dplyr and readr for data import. The tidyverse now has its own uber-package, also called tidyverse, that will install all the varied component packages; there's also a tidyverse website, tidyverse.org.
Jennifer Bryan, associate professor of statistics at the University of British Columbia in Vancouver, is known in the R community in part for her teaching materials such as Happy Git and GitHub for the useR and R packages like googlesheets. She is also part of the leadership team at ropensci, a project to help bring more R tools to the scientific community.
Wickham said Bryan's "empathy for new users of R" will bring a helpful perspective as his team works to improve the R user experience. He envisions her continuing to work on the googlesheets package, perhaps do some maintenance on the readxl package, and maybe look at creating more educational materials, as well as look at ways for R to tap into APIs. "More general Jenny awesomeness," he added.
Other hires are working on critically important if less high-profile infrastructure projects, such as R integration with Travis for testing code and OBDC databases with R. Infrastructure issues are an area where open-source platforms risk failure, Wickham said, since "I find it hard to imagine anyone would [do this type of work] out of the goodness of their heart." That's why he believes open-source software projects need financial backing that can pay for such work.
Otherwise, what might be on tap for 2017? Probably not his still-in-development ggvis package, he said when asked (ggvis was envisioned as a way to create interactive Web graphics similar to the way ggplot2 produces static visualizations). ggvis "keeps sliding down my priority list," he admitted. Instead, he expects next year to focus more on the tidyverse: making sure "all the pieces fit together really well," building out pieces that might be missing and ensuring functions and packages are well documented.
For more on the tidyverse, check out my ggplot2 cheat sheet, searchable by task.