Can we predict who will win the Super Bowl?

Peter_Hersh · Jan 30, 2018 12:14 PM

Who will win the Super Bowl? The statistical model says...

“I predict one of these two teams will win the Super Bowl.” -- Gilbert Gottfried, comedian

Whether you enjoy football, musical performances, funny commercials, food or smaller crowds at the grocery store on Sunday afternoon, the Super Bowl is for you. This year will be the 52^nd Super Bowl for those of you non-football fans. This year’s match-up pits perennial powerhouse New England Patriots against one of the more futile franchises in the NFL, the Philadelphia Eagles. If you are a fan of evil dynasties or big underdogs, this year’s match-up has it all. It's time to look at the data to try to determine the likely winner!

Will Vegas tell us?

I decided to pull in data from a couple sources to see if I could predict a winner without bias. First, I pulled data from Vegas odds makers to see how often the favorite has won.

Super Bowl 1.png

Since the Super Bowl began following the 1967 NFL season, Vegas has picked the winner correctly two-thirds of the time, or 34 out of 51. Pretty good, Vegas, but if we look at the last 10 Super Bowls, that trend has not kept up. Only three of the last 10 favorites have won the Super Bowl, so oddsmakers appear to be on a bit of a cold streak.

Super Bowl 2.png

What about quarterbacks?

The next thing I looked at was quarterbacks: Tom Brady vs. Nick Foles. This one looks like a landslide, right? Well, I looked at how back-up QBs like Nick Foles have fared in the Super Bowl. To my surprise, back-ups (one of which happened to be Tom Brady) are 8-2 in the Super Bowl, which is even better than Tom Brady, who has a 6-2 record. Vegas and QBs have not given me a clear indicator for who will win this game, so I decided to pull more data to dig in a little more.

Clustering offense and defense

I pulled in data from the NFC and AFC championship games to determine whether those results had an impact on the Super Bowl results. I also grabbed data from the regular season to see how well the offense and defense performed throughout the year and how that compared to past champions.

Now that I had the data, I decided to cluster in JMP to see what made up the typical Super Bowl champion. I looked at the data from 1970-present, skipping the four Super Bowls that happened before the merger.

Super Bowl 3.png

Most of the past champions along with both of this year’s participants are in Clusters 1 and 2. Let’s look at the average offensive and defensive ranks based on points scored and allowed each game for each cluster.

Super Bowl 4.png

Cluster 1 and Cluster 2 represent very good teams with, on average, a top-5 defense and top-5 offense. Cluster 1 represents teams with slightly better defenses, and Cluster 2 represents teams with slightly better offenses. Both of this year’s participants are in Cluster 2. The Eagles rank fourth in scoring defense and third in scoring offense this year. The Patriots rank fifth in scoring defense and second in scoring offense this year.

If you look at the dendrogram, you can see that the Eagles and Patriots are very similar teams. Cluster 3 is made up of top-flight defenses with above-average offenses. Cluster 4 is teams that won despite having below-average offenses. Cluster 5 represents teams that won despite having bad defenses and above-average offenses.

Vegas odds, QBs and clustering have not given me a clear picture of who will likely win the Super Bowl this year. That means it's time to make a predictive model in JMP, where we combine all the factors for every Super Bowl participant since 1970. I use 75% of the teams that participated in the Super Bowl to make the model (Training), and 25% of the team I held out to see how well the model did at predicting (Validation).

Super Bowl 5.png

The model gets it right more than 90% of the time in the Training set. The Training set is like the post-game wrap-up. It is much easier to find crucial factors when you have the benefit of knowing who won the game. To fairly evaluate the model, look at the Validation set, which is predicting how the game will turn out without knowing the end results. The Validation set gets the winner correct 74% of the time -- a model that does better than the Vegas oddsmakers!

Who does the model pick to win?

I am sorry to say that the model picks the Patriots. But Eagles fans: Don’t lose hope; no model is correct all the time. In fact, this one is wrong 26% of the time.