R in 5 lines or less: Data breach quick analysis

Stacked bar plot created with the R GGally package

Stacked bar plot created with the R GGally package

It would take a lot more than 5 lines of code to do serious analysis of complex security data; in fact, there's an entire book written on the topic of Data-driven Security. Here, though, we're just looking at a list of announced security breaches in 2014 compiled by Privacy Rights Clearinghouse, focusing in on types of breaches and victims. Are there any interesting stories to tell from these two categories?

The video below shows some easy ways of viewing item counts and generating a few visualizations from the data. If you'd like to try this yourself, the 5 lines of code are after the video. Breach data can be downloaded from the Privacy Rights Clearinghouse website. I did a bit of manual cleaning of that data and only kept columns I called DatePublic, Organization, Entity, Type, City, State, InfoSource and Year. (Unfortunately, data about numbers of people affected by each breach was not in a useful format for analysis).

If you don't already have GGally on your system, remember to download and install it first with install.packages("GGally").

breaches = read.csv("breaches_2014.csv", header = TRUE, stringsAsFactors = FALSE)
ggfluctuation2(table(breaches$Entity, breaches$Type))
barplot(table(breaches$Entity, breaches$Type), legend.text = TRUE, col = terrain.colors(7),
        main = "2014 breaches, via Privacy Clearinghouse")

New to R? Check out my Beginner's Guide to R free PDF download.

See more examples of R in 5 lines or less.

The march toward exascale computers
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies