Super Bowl data point: All wins aren't equal

When the 12-4 Patriots went to Denver for the AFC Championship game, it was arguably a fairly even match. New England had beaten the Broncos in the regular season, and the Broncos' record was just one better at 13-3.

But it turns out all wins aren't created equal.

Yes, the Patriots beat the Broncos, but that was when the Pats were at home. On the road, not only was New England just 4-4, but it turns out they didn't beat a single team that ended up over .500 away from Gillette Stadium.

So that got me wondering about this weekend's two Super Bowl teams and their identical 13-3 regular-season records. How many of those wins were against tough teams? And could either defeat tough challengers away from their home stadiums?

This is admittedly a somewhat inexact science -- for example, beating the Packers while Aaron Rodgers was out was likely an easier victory than after he came back, regardless of the Packers' final record. But graphing opponents' final records does give a little more data than simply "13-3."

Here are your dataviz details:

Denver Broncos 13-3 record
Seattle Seahawks 13-3 record

It turns out that Denver was 4-3 against winning teams this year, including a respectable 2-2 record on the road. And almost half of their wins came against teams that were at least .500 for the season. And Seattle was also 4-3 against winning teams and 2-2 on the road. While the Seahawks didn't play anyteams to finish at exactly 8-8, they did have four wins against teams that ended up 7-9 (including the St. Louis Rams, whose two of nine losses came at the hands of Seattle.)

And despite the reputation that both teams have for huge home-field advantages, it turns out that both were 6-2 on the road.

Conclusion? If this data matters in determining whether teams are roughly equivalent, it could be a close game Sunday.

This data was collected, analyzed and graphed using R (the R Project for Statistical Computing) and several add-on R packages. It's not particularly elegant, but if you're interested, you can see the R code used for this blog post on the next page.

Update: It turns out the data point that mattered was the Seattle Seahawks' number-one-ranked defense. 

# Get team records

library(XML)

library(doBy)

# Get record results

url1 = "http://www.footballdb.com/standings.html"

table = readHTMLTable(url1, which=2, header=TRUE, skip.rows=1)

colnames(table)[1] <- "Team"

delete.rows <- c(5,6,11,12,17,18,23,24,25,30,31,36,37,42,43)

records <- table[-delete.rows, ]

url2 <- "http://www.pro-football-reference.com/years/2013/games.htm"

tables.results <- readHTMLTable(url2, which=1, header=TRUE)

# Get game results

games <- subset(tables.results, Week != "Week")

games <- games[, -4]

names(games)[5] <- "atsign"

# Use regular season weeks only

games <- games[1:256,]

# Convert game dataframe points columns into numbers. First need to convert to characters, then numbers

games$PtsW <- as.character(games$PtsW)

games$PtsL <- as.character(games$PtsL)

games$PtsW <- as.numeric(games$PtsW)

games$PtsL <- as.numeric(games$PtsL)

# Convert W L T Pct columns in records dataframeto numbers

records$W <- as.character(records$W)

records$L <- as.character(records$L)

records$T <- as.character(records$T)

records$Pct <- as.character(records$Pct)

records$W <- as.numeric(records$W)

records$L <- as.numeric(records$L)

records$T <- as.numeric(records$T)

records$Pct <- as.numeric(records$Pct)

library(stringr)

# Make the games column headers R friendly by changing slashes to .

names(games) <- str_replace_all(names(games), "/", '.')

# deal with ties

ties <- games[games$PtsW == games$PtsL, ]

# Converting factors to characters here.

games$Winner.tie <- as.character(games$Winner.tie)

games$Loser.tie <- as.character(games$Loser.tie)

records$Team <- as.character(records$Team)

games$atsign <- as.character(games$atsign)

# Check to make sure the teams are named the same in both data frames

x <- games$Winner.tie %in% records$Team

y <- games$Loser.tie %in% records$Team

# OK I'm good to go!

# Now, add columns to the games data frame with winner record and loser record

# Write a quick function to extract winning percentage of a team from Records database

get_win_pct <- function(myteam, df=records){

  pct <- subset(df,Team==myteam, select="Pct")

  return(pct[1,1])

}

# Add columns with records of winners and losers

games$Winner.record <- mapply(get_win_pct, games$Winner.tie)

games$Loser.record <- mapply(get_win_pct, games$Loser.tie)

# Because of the original data format, a column with an @ signifies whether a team

# was at home or away. 

games <- transform(games, homeLocation = ifelse(atsign=="@", Loser.tie, Winner.tie))

# Convert the games$Date column into actual R Date objects

library(lubridate)

games$Date2 <- paste(games$Date, "2013", sep=", ")

games$Date2 <- mdy(games$Date2)

# This function gets the necessary info for a single team

get_info <- function(myteam, df=games){

  wins <- subset(df, Winner.tie==myteam, select=c("Date", "Winner.tie", "Winner.record", "Loser.tie", "Loser.record", "homeLocation"))

  wins$result <- "W"

  losses <- subset(df, Loser.tie==myteam, select=c("Date", "Winner.tie", "Winner.record", "Loser.tie", "Loser.record", "homeLocation"))

  losses$result <- "L"

  info <- rbind(wins, losses)

  info$winloss <- factor(info$result, levels=c("L", "T", "W"), ordered=TRUE)

  info$opponent <- ifelse(info$result == "W", info$Loser.tie, info$Winner.tie)

  info$opponent.record <- ifelse(info$result == "W", info$Loser.record, info$Winner.record)

  info <- orderBy(~-winloss + -opponent.record, info)

  info$id <- 1:nrow(info)

  info$homegame <- ifelse(info$homeLocation == myteam, "Home", "Away") 

  info$WLHA <- paste(info$result, info$homegame, sep=" ")

  return(info)

}

plotResults <- function (t1) {

  ### START GRAPHING ###

  # Rearrange data to graph wins arranged by date and then losses arranged by date  

  t1$winloss <- factor(t1$result, levels=c("L", "T", "W"), ordered=TRUE)

  t1 <- orderBy(~-winloss + -opponent.record, t1)

  # create an ordered ID field. Database force of habit.

  t1$id <- 1:nrow(t1)

  x <- t1$homeLocation

  ux <- unique(x)

  mytitle <- ux[which.max(tabulate(match(x, ux)))]

  # Now create the graph using ggplot2

  g <- ggplot(t1, aes(x=id, y=opponent.record, fill=WLHA)) +

    geom_bar(stat="identity") + 

    scale_x_discrete(breaks=t1$id, labels=t1$opponent) + 

    theme_minimal() +

    theme(axis.text.x = element_text(angle = 60, hjust = 1)) +

    labs(x="", y="", fill="Game result") + 

    ggtitle(mytitle) +

    scale_fill_manual(values=c( "darkseagreen", "forest green", "firebrick1", "firebrick4"), limits=c("W Home" , "W Away","L Home", "L Away"), labels=c("Home Win", "Road Win", "Home Loss", "Road Loss" )) + 

    geom_abline(intercept = 0.5, slope=0, colour="black", linetype="dashed")

  return(g) # function returns the graph

}

denver <- plotResults(get_info("Denver Broncos"))

seattle <- plotResults(get_info("Seattle Seahawks"))

patriots <- plotResults(get_info("New England Patriots"))

print(denver)

print(seattle)

print(patriots)

FREE Computerworld Insider Guide: Five IT certifications that won’t break you
Join the discussion
Be the first to comment on this article. Our Commenting Policies