R in 5 lines or less: Data breach quick analysis

Stacked bar plot created with the R GGally package

Stacked bar plot created with the R GGally package

Credit: Screenshot by Sharon Machlis of graph created with R package GGally

It would take a lot more than 5 lines of code to do serious analysis of complex security data; in fact, there's an entire book written on the topic of Data-driven Security. Here, though, we're just looking at a list of announced security breaches in 2014 compiled by Privacy Rights Clearinghouse, focusing in on types of breaches and victims. Are there any interesting stories to tell from these two categories?

The video below shows some easy ways of viewing item counts and generating a few visualizations from the data. If you'd like to try this yourself, the 5 lines of code are after the video. Breach data can be downloaded from the Privacy Rights Clearinghouse website. I did a bit of manual cleaning of that data and only kept columns I called DatePublic, Organization, Entity, Type, City, State, InfoSource and Year. (Unfortunately, data about numbers of people affected by each breach was not in a useful format for analysis).

If you don't already have GGally on your system, remember to download and install it first with install.packages("GGally").


breaches = read.csv("breaches_2014.csv", header = TRUE, stringsAsFactors = FALSE)
table(breaches$Type)
library(GGally)
ggfluctuation2(table(breaches$Entity, breaches$Type))
barplot(table(breaches$Entity, breaches$Type), legend.text = TRUE, col = terrain.colors(7),
        main = "2014 breaches, via Privacy Clearinghouse")

New to R? Check out my Beginner's Guide to R free PDF download.

See more examples of R in 5 lines or less.

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.