Subscribe Bookmark



Sep 12, 2014

Score! A Pythagorean Theorem for Hockey


I’m an avid sports fan and have followed basketball, football and baseball at both the collegiate and professional levels virtually all my life. But this year, something really cool happened. My eyes were opened to the great game of hockey, and just like that I’ve added another sport to my list of interests!

Growing up in Tennessee and Georgia did not really afford me the opportunity to catch a lot of live ice hockey games. But as the season began this year, I found myself attending the Boston Bruins’ practice sessions, and got to attend some games as well! What an incredible experience!

My task with the Bruins was to help them learn about or implement SAS Enterprise Guide, SAS Enterprise Miner and JMP. My responsibilities did not require that I “know the game,” but I still wanted to learn and contribute. So I started watching as many games as possible, reading articles, and talking to people who were longtime fans. I tried to develop my understanding of the game through analytics as I wanted to resolve which statistics most affect winning or losing games.

For example, I wanted to investigate the statistic that most game broadcasts show right alongside the score: shots. It turns out that shots as depicted on those broadcasts is not a metric representing every time a player sends the puck toward the net, but rather “shots on goal.” In the NHL, a shot is differentiated as a shot that has no chance of going in the net, a blocked shot, and a shot that is essentially on target and would have been a goal if not for the effort of a defensive player. The last category is shots on goal (SOG), and as they are so prominently displayed with the score, I assumed that SOG must be important!

To investigate the impact of an SOG, I took 10 years of data and plotted the average number of SOGs per game to see its relationship to wins… but that brought up another caveat in this quest. In the National Hockey League (NHL), and for the purpose of the analysis, how should I define a “win"?

In the other major sports, it’s purely about the wins, losses and the overall winning percentage to determine who’s going to win divisions or gain wildcard berths, and move on to the playoffs. In the NHL, those outcomes are determined by “points,” and those points are earned by winning a game for which you get 2 points, or by losing a game in overtime for which you get 1 point for taking the game beyond regulation. So instead of modeling against a team's winning percentage, I instead focused on points. This meant that I was awarding a “half” a win for getting a game into overtime regardless of the eventual outcome. To take it a step further, I considered the points earned versus the number of “possible points” such that my dependent variable in my analysis would be a ratio:


Using the Graph Builder platform in JMP, I plotted SOGs per game (SPG) versus the Point Ratio and found the following:



We can see that a positive linear pattern exists, but there’s a lot of noise as well. A line was also fit, and an R2 of 0.470 was observed. So, will SPG help me predict the points ratio a team will have over the course of a season? It does … but not very well.

Now, let’s back up just a bit: I’m a “Certified Sabermetrician,” and the most basic analysis I did during the process of gaining that certification in baseball analytics was to recreate Bill James’ Pythagorean Theorem for Baseball, which effectively relates the number of runs scored and runs allowed to a team’s winning percentage.

The Pythagorean Theorem for Baseball creates a trail of believability (which is very important in the analysis of any subject), and a fundamental metric relating winning percentage to the most fundamental elements of the game (the runs a team scores and the runs scored against the team). In the case of hockey, that means we’ll look at the goals a team scores and the goals scored against the team. General managers (GM) from baseball (depicted famously in the movie “Moneyball”) started trying to manage toward affecting the ratio. They sought players that would either get them more runs, or allow fewer runs, and they started estimating their winning percentage based on their personnel’s expected production.

So, I set out to look at “goals for” and “goals against” in the same light, and to create a “Pythagorean Theorem for Hockey,” with the primary difference being that I’m going to estimate the ratio of points earned to possible points instead of winning percentage. I’m using the following equation to create the estimate:



Again, considering 10 seasons of data, I used Graph Builder in JMP to create the following graph, just as I did with the analysis of SOGs.



A coefficient of determination (R2) of 0.892 looks strong! I recall seeing values in my baseball research that were almost that good, but only after optimizing the exponent; I haven’t done that in this effort.

So what does this tell us? It tells us that in order to earn more points in the NHL, you must affect either the goals you score or the goals scored against you. While this may sound perfectly reasonable and maybe even unremarkable, what happens with such information is that GMs will start to consider the “wins above replacement” when looking at free agents and trades. They will look at relative parameters and consider, for example, that if a player is “as good” on defense, will he be more productive offensively?

While I’m not a GM, I’m going to continue my research. I wonder how I might use analytics to help a team score more goals? Or maybe I can improve the valuations of goalie stats and team defense?

I’ve used SAS Enterprise Guide to pull the data together, create variables and program this Pythagorean Theorem for Hockey, and I’ve used JMP to rapidly create visualizations of the analysis.

Next, we’ll use play-by-play data provided by the NHL to study “shot value,” and build on this foundational analysis.

Until the next puck drops!


Maybe we should think about doing this for our local team: Carolina Hurricanes. :-)

So being an ice hockey dad of an 18 year old goalie that has played high level travel hockey the last 14 years I can tell you about keeping track of his GAA (goals against averages), the team's plus / minus stats, where the shots went it..blocker side, five hole, and more.  There are many factors that come into play.  For netminders much of it is mental and of course focus & tracking.  Physical ability, push off, speed, rebound control, and the list goes on.  And sometimes it's just an off day...lack of rest, not feeling well, injury, etc.  From info captured during all sorts of skills and tests gathered during weekends at the gym and on ice, the coach can try and make sense of this "hockey combine" type data.  (sprint speed, vertical jump height, lateral movement times, etc) Coaches have to use all sorts of info to figure out how to form the lines.  1st line, 2nd line, 3rd line. Which players do they mix and match?  Which have the highest minus rating (on the ice when a goal is scored against the team).  Which have the highest plus rating (on the ice when their team scores a goal).  Who do they put with who? Which players gel the best together?  Which don't?  Why?  Which combination would work the best?  Why? For shots on goal..what is the quality of the shot on goal?  Speed, traffic in front, odd bounce, how far out.  There are so many factors involved.  The internet of things may change some of this even at the non NHL level in the future.  Imagine a sensor in the puck or sensors on the players to track all sorts of data easier.  Who was on the ice and where when the puck went in the net?  How are these lines working out in real time?  Who's been on the ice too long, who has not enough?  The list goes on.  So much data!!  So much fun to try and make sense of it all. 


My visual analytical experience in Pittsburgh:


@sabisw Yes! Go Canes Go!

@ssavchenko What a wealth of knowledge you've shared in your post! We have worked with a team to help them understand the "best combinations of players" to have on the ice, as well as identifying shot value. And, I leaned from one of the teams that we've worked with that the NHL has experimented with sensors on the pucks, but a "high tech" puck is very expensive! Thanks for sharing the information!