Beginner's guide to R: Easy ways to do basic data analysis

Part 3 of our hands-on series covers pulling stats from your data frame, and related topics.

1 2 3 4 5 6 Page 2
Page 2 of 6

If you want to see just the column names in the data frame called mydata, you can use the command:

colnames(mydata)

Likewise, if you're interested in the row names -- in essence, all the values in the first column of your data frame -- use:

rownames(mydata)

Pull basic stats from your data frame

Because R is a statistical programming platform, it's got some pretty elegant ways to extract statistical summaries from data. To extract a few basic stats from a data frame, use the summary() function:

summary(mydata)

R's summary function
Results of the summary function on a data set called diamonds, which is included in the ggplot2 add-on package.

That returns some basic calculations for each column. If the column has numbers, you'll see the minimum and maximum values along with median, mean, 1st quartile and 3rd quartile. If it's got factors such as fair, good, very good and excellent, you'll get the number of each factor listed in the column.

The summary() function also returns stats for a 1-dimensional vector.

If you'd like even more statistical summaries from a single command, install and load the psych package. Install it with this command:

install.packages("psych")

You need to run this install only once on a system. Then load it with:

library(psych)

You need to run the library command each time you start a new R session if you want to use the psych package.

Now try the command:

describe(mydata)

and you'll get several more statistics from the data including standard deviation, "mad" (mean absolute deviation), skew (measuring whether or not the data distribution is symmetrical) and kurtosis (whether the data have a sharp or flatter peak near its mean).

R has the statistical functions you'd expect, including mean(), median(), min(), max(), sd() [standard deviation], var() [variance] and range()which you can run on a 1-dimensional vector of numbers. (Several of these functions -- such as mean() and median() -- will not work on a 2-dimensional data frame).

1 2 3 4 5 6 Page 2
Page 2 of 6
  
Shop Tech Products at Amazon