Beginner's guide to R: Get your data into R

In part 2 of our hands-on guide to the hot data-analysis environment, we provide some tips on how to import data in various formats, both local and on the Web.

1 2 3 4 Page 3
Page 3 of 4

Other formats

There are R packages that will read files from Excel, SPSS, SAS, Stata and more. For Excel, there's the readxl package.

If you'd like to try to connect R with a database, there are several dedicated packages such as RPostgreSQL, RMySQL, RMongo, RSQLite and RODBC. And, the popular dplyr package includes some database support.

(You can see the entire list of available R packages at the CRAN website.)

Remote data

read.csv() and read.table() work pretty much the same to access files from the Web as they do for local data. For example, Pew Research Center data about mobile shopping are available as a CSV file for download. You can store the data in a variable called pew_data like this:

pew_data <- read.csv("http://bit.ly/11I3iuU")

It's important to make sure the file you're downloading is in an R-friendly format first: in other words, that it has a maximum of one header row, with each subsequent row having the equivalent of one data record. Even well-formed government data might include lots of blank rows followed by footnotes -- that's not what you want in an R data table if you plan on running statistical analysis functions on the file.

Do you want Google spreadsheets data in R? You don't have to download the spreadsheet to your local system as you do with a CSV. Instead, use Jenny Bryan's googlesheets package. Authorize your account, register a sheet in your account as a Google Sheets object with mysheet <- gs_title("The Spreadsheet Title"), and then read in a worksheet with mydata %lt- gs_read(mysheet).

One package for basic import/export

It can get difficult to remember which package to use for what kind of data import. That's a problem the rio package (for R input/output) was designed to solve. It has three basic functions: import, export, and convert. Based on the file name, rio knows how to handle it. mydata %lt- import("myfile.xlsx") will import an Excel spreadsheet. mydata %lt- import("myfile.json") does json, mydata %lt- import("myfile.html") can pull in HTML tables, and so on. It handles more than 30 formats; check the package's GitHub repository for details.

1 2 3 4 Page 3
Page 3 of 4
7 inconvenient truths about the hybrid work trend
 
Shop Tech Products at Amazon