Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
In my previous post, I showed how we explored the eras of baseball using a simple scatterplot that helped us generate questions and analytical direction. The next phase was figuring out how I might use analytics to aid the “subject matter knowledge” that had been applied to the data. Could I confirm what had been surmised regarding the eras of the game? And, might we use analytics to surface more periods of time in the game that we should investigate, or at least, attempt to explain?
Generalized Regression is a method that we might use to build a predictive model, but in this case I wasn’t seeking to build a predictive model. I wanted to take advantage of the variable selection capabilities that are available in the Lasso Option of this method. I wanted to see which variables were selected as “significant and important,” and in what order. I thought that what’s selected will ultimately be the years in which I’m interested – that is, I thought I would confirm my assessment of the eras of the game and probably find some interesting information that my “eyeball” method had overlooked.
Longtime SAS associate and friend Tony Cooper is one of the people who talks baseball with me. Besides being a sports fan in his own right, Tony is an expert JSL (JMP Scripting Language) programmer. In mere minutes, he created a script to help me develop a set of dummy variables that enabled better differentiation of run production through the years. It was a key to the results!
In Generalized Regression, I was looking for a method that would help me surface “change,” and I had seen this tool in action in other areas. My expectations were high, and this method did not disappoint. In fact, the results were exciting, indicating that this method could uncover “change” not only in baseball data but also in other areas!
Utilizing the JMP platform for Generalized Regression enabled me to walk through the years of the game and confirm suspicions about the eras of the game. And it also found what might not necessarily be an “era” but a demonstration of the effects of expansion in specific years, or (for example) the act of raising the mound in 1968 and then lowering it in 1969.
The graph and table shown below tells a story: Solution Path shows the order at which variables enter the model. And, in the accompanying table, the estimates tell us if RPG (Runs Per Game) went up or down in the associated year.
Just like the scatterplot, this visualization led to questions as each year was presented. For example:
What was going on in 1993? Answer: It could have been the start of the “Steroid Era,” or it could be due expansion (two new teams joined the league) and an indication of the dilution of pitching talent as runs went up, or both!
What was happening in 1942? Answer: Runs went down as major league players left the league to join the military. The period from 1942 to 1946 is generally referred to as the “War Years.”
What was happening in 1920 and 1921? Answer: Baseball was leaving the “Dead Ball Era” for the “Live Ball Era.” Both players (like Babe Ruth) and events affected the game.
What about 1963? Answer: Baseball historians traditionally point to this year as the beginning of the period of expansion, and some call the period from 1963 to present the “Expansion Era.”
And finally, why were runs down in 2010? Answer: Baseball had cracked down hard on steroid users, and it may have caused a decrease in overall runs.
So, while having some fun playing with data from a favorite game, we were also able to demonstrate the effectiveness of generating questions and analytical direction using the simple scatterplot. We then showed how we could surface information at even greater levels using Generalized Regression. Potential applications to areas outside the arena of sports are endless. How might you use Generalized Regression?
Interested in seeing more?
You can learn more about the JSL script Tony Cooper created to aid in the analysis (and download it for yourself) by visiting the JMP File Exchange.
Follow the analysis using the Generalized Regression method by watching this step-by-step video: