I’m an avid sports fan and have followed basketball, football and baseball at both the collegiate and professional levels virtually all my life. But this year, something really cool happened. My eyes were opened to the great game of hockey, and just like that I’ve added another sport to my list of interests!
Growing up in Tennessee and Georgia did not really afford me the opportunity to catch a lot of live ice hockey games. But as the season began this year, I found myself attending the Boston Bruins’ practice sessions, and got to attend some games as well! What an incredible experience!
My task with the Bruins was to help them learn about or implement SAS Enterprise Guide, SAS Enterprise Miner and JMP. My responsibilities did not require that I “know the game,” but I still wanted to learn and contribute. So I started watching as many games as possible, reading articles, and talking to people who were longtime fans. I tried to develop my understanding of the game through analytics as I wanted to resolve which statistics most affect winning or losing games.
For example, I wanted to investigate the statistic that most game broadcasts show right alongside the score: shots. It turns out that shots as depicted on those broadcasts is not a metric representing every time a player sends the puck toward the net, but rather “shots on goal.” In the NHL, a shot is differentiated as a shot that has no chance of going in the net, a blocked shot, and a shot that is essentially on target and would have been a goal if not for the effort of a defensive player. The last category is shots on goal (SOG), and as they are so prominently displayed with the score, I assumed that SOG must be important!
To investigate the impact of an SOG, I took 10 years of data and plotted the average number of SOGs per game to see its relationship to wins… but that brought up another caveat in this quest. In the National Hockey League (NHL), and for the purpose of the analysis, how should I define a “win"?
In the other major sports, it’s purely about the wins, losses and the overall winning percentage to determine who’s going to win divisions or gain wildcard berths, and move on to the playoffs. In the NHL, those outcomes are determined by “points,” and those points are earned by winning a game for which you get 2 points, or by losing a game in overtime for which you get 1 point for taking the game beyond regulation. So instead of modeling against a team's winning percentage, I instead focused on points. This meant that I was awarding a “half” a win for getting a game into overtime regardless of the eventual outcome. To take it a step further, I considered the points earned versus the number of “possible points” such that my dependent variable in my analysis would be a ratio:
We can see that a positive linear pattern exists, but there’s a lot of noise as well. A line was also fit, and an R2 of 0.470 was observed. So, will SPG help me predict the points ratio a team will have over the course of a season? It does … but not very well.
Now, let’s back up just a bit: I’m a “Certified Sabermetrician,” and the most basic analysis I did during the process of gaining that certification in baseball analytics was to recreate Bill James’ Pythagorean Theorem for Baseball, which effectively relates the number of runs scored and runs allowed to a team’s winning percentage.
The Pythagorean Theorem for Baseball creates a trail of believability (which is very important in the analysis of any subject), and a fundamental metric relating winning percentage to the most fundamental elements of the game (the runs a team scores and the runs scored against the team). In the case of hockey, that means we’ll look at the goals a team scores and the goals scored against the team. General managers (GM) from baseball (depicted famously in the movie “Moneyball”) started trying to manage toward affecting the ratio. They sought players that would either get them more runs, or allow fewer runs, and they started estimating their winning percentage based on their personnel’s expected production.
So, I set out to look at “goals for” and “goals against” in the same light, and to create a “Pythagorean Theorem for Hockey,” with the primary difference being that I’m going to estimate the ratio of points earned to possible points instead of winning percentage. I’m using the following equation to create the estimate:
Again, considering 10 seasons of data, I used Graph Builder in JMP to create the following graph, just as I did with the analysis of SOGs.
A coefficient of determination (R2) of 0.892 looks strong! I recall seeing values in my baseball research that were almost that good, but only after optimizing the exponent; I haven’t done that in this effort.
So what does this tell us? It tells us that in order to earn more points in the NHL, you must affect either the goals you score or the goals scored against you. While this may sound perfectly reasonable and maybe even unremarkable, what happens with such information is that GMs will start to consider the “wins above replacement” when looking at free agents and trades. They will look at relative parameters and consider, for example, that if a player is “as good” on defense, will he be more productive offensively?
While I’m not a GM, I’m going to continue my research. I wonder how I might use analytics to help a team score more goals? Or maybe I can improve the valuations of goalie stats and team defense?
I’ve used SAS Enterprise Guide to pull the data together, create variables and program this Pythagorean Theorem for Hockey, and I’ve used JMP to rapidly create visualizations of the analysis.
Next, we’ll use play-by-play data provided by the NHL to study “shot value,” and build on this foundational analysis.