Level: Intermediate
Christian Stopp, JMP Systems Engineer, SAS Don McCormack, Principal Systems Engineer, SAS
Generations of fans have argued as to who the best Major League Baseball (MLB) players have been and why, oft citing whichever performance measures best supported their case. Whether the measures were statistics of a particular season (e.g., most home runs) or cumulative of their career (e.g., lifetime batting average), such statistics do not fully relate a player’s performance trajectory. As the arguments progress, it would be beneficial to capture the inherent growth and decay of player performance over one’s career and distill that information with minimal loss.
JMP’s Functional Data Explorer (FDE) has opened doors to new ways of analyzing series data and capturing ‘traces’ associated with these functional data for analysis. We will explore FDE’S application in examining player career performance based on historical MLB data. With the derived scores we will see how well we can predict whether a player deserves their plaque in the Hall of Fame…or is deserving and has been overlooked, as well as compare these predictions with those based solely on the statistics of yore. We’ll confirm Ted Williams really was the greatest MLB hitter of all time. What, you disagree?! Must be a Yankees fan… Auto-generated transcript... Speaker Transcript Christian So thank you, folks, for joining us here today at the JMP Discovery Summit, the virtual version. My name is Christian Stopp. I am a JMP systems engineer. And I'm joined today by my colleague Don McCormack, who's a principal systems engineer for JMP as well. And you probably got here because you saw the title of the talk. And you saw this was...you're a baseball fan about Major League Baseball players and wanted in or you saw it was about functional data explorer and you wanted to learn a little bit more about how to employ functional data explorer in different environments. So we're going to marry those two topics today. Don and I and I'm going to gear my conversation a little more for the baseball fans first. Just as we're having kind of common conversations among baseball players and baseball fans, you might think about how your favorite player does relative to other players and you might have with your friends, these conversations and hopefully they're kept, you know, polite about about who your favorite player is and why. And so that's kind of how I imagined this infamous conversation between Alex Rodriguez and Varitek going was just about who...comparing notes about who their favorite player was. And so for me, my origin started off, and like Don's, with respect to just be having a love for baseball and being interested in the baseball statistics that you'll find in the back of the bubble gum cards we used to collect. And so as you have these conversations about who your favorite player is, you might note that players differ with respect to how good they are, but also different things like when they age... as they age, where they peak, like where the performance starts to go off over time. And so as you're thinking about maybe like me the career trajectories of these players, you might want to question, Well, how do I capture or model that performance over time? Now, if you're oddly like me, you decide that you want to pursue statistics so that you can do exactly that. But I would encourage you to skip that route and be smarter than me and just use a tool like functional data explorer to help you turn those statistics...statistical curves into numbers to use for your endeavors. So for those of you who are a little less familiar with baseball, but what we'll be seeing is data reflecting things that are measures of baseball performance. So I'm going to be speaking about position players and position players bat. And so one of the metrics of their batting prowess is on-base percentage plus slugging percentage or OPS. And so on the Y axis, I've got that that measure for a couple of different players as they age. And the blue is Babe Ruth and the red is Ted Williams. And as you can see, you get a sense from these trajectories that they both appear to have about the same quality of performance over most of their careers. But you might know that where they peak might seem to be a little at an older age for Ted Williams, as opposed to maybe Babe Ruth. And Babe Ruth, it looks like he maybe needed just a little bit of time to just get up to speed to get to that measure if you're just looking at this plot without any other knowledge. So there's a lot of...this is just two players in the thousands of players or tens of thousands that you might be considering and just look at comparing, you can imagine there's a lot of variability about these characteristics of their career trajectories. So there's also clearly variability within a player's trajectory, too. So I might use the smoothing function of the Graph Builder here and just smooth out the noise associated with those curves a little bit, to get a better sense of the signal about that player's trajectory. And it turns out that that smoothing is is very similar to what's going on in that process that functional data explorer employs. So here I've got functional data explorer and again I'm...my metric here is on-base percentage plus slugging percentage, OPS. And I'm looking just to see...like we're comparing these these player trajectories, now, in FDE is, functional data explorer, is smoothing out those player curves, as you can see, and then extracting information about what's common across those curves. And so for every player now, what you get in return for doing that is, are scores that are associated with that player's performance. And so these scores describe the player's career trajectory in a nice little quantitative way for us to take away and use another analyses like we'll be doing. So it's just, you can see that a little bit, these are Hank Aaron scores. And in the profiler that you'll...that you can access in the functional data explorer, you can actually change...you can look at that trajectory here for that player's OPS over age and then change those values to reflect what that player's scores are and get a better...replicate their their career trajectory with those scores. Right, so that's a little bit about FDE and and how to employ it here. So you'll see Don and I talking about these statistics that we're now equipped with, these player scores that we get out from the functional data explorer, that gets it from those curves that we started off with. And so we're going to use that...some what we're doing is predicting like maybe Hall of Fame status. And not only who's in the Hall of Fame that they belong there, or more more interestingly, like maybe, who are the players who are in the Hall of Fame that maybe shouldn't be because the stats don't support it or maybe identify players who the Hall of Fame committee seems to have snubbed. So we'll talk a little bit about just the different metrics that we used and how we kind of revised them. And then taking those career trajectories using FDE and then getting the scores out and doing the prediction, like we normally would with other things. So if you haven't followed baseball, the Hall of Fame eligibility...eligibility requirements are that a player had to play at least 10 years, so 10 seasons, and had to wait...you have to wait five years before you're eligible. And then you have 10 years during which you're eligible and folks can vote you in. So there's a couple of players we'll see that are still have...that are still waiting for the call. The hall uses a different selection criteria are primarily around how well the player performed, but also take into account these other things that the data source we're using, Lahman Database, doesn't include, so it's hard to measure. So we're just stick with analyses that reflect their statistical prowess on the on the field. And of course after, you know, 150 years of baseball players playing baseball, you might recognize that they're playing in different eras. And so we want to make sure that we're comparing the players to their peers. And so we're going to take that, you know, maybe the year that they played into account, or the position that they played since different requirements are associated...would typically be associated with different positions. And then different leagues have different rules; we'll weigh that in, too. That's where I'm gonna stop. Don's gonna kick over to pitching and then I'll come back and talk about position players. donmccormack So like Christian said, I'm going to talk a little bit about pitching but while I'm doing that, before that I'm doing that, what I would like to do is, I would like to illustrate some of those initial points that that Christian mentioned. The things that are good data analytic techniques, things that really need to be done, regardless of what modeling technique that you use, however, it turns out that they are good things to do before you model your data using FDE. I'm going to talk specifically about cleaning the data, about normalizing the data, so you can compare people equally, and then finally modeling the data. So as an illustration, what I've got...what you see on the screen right now, we are looking at three very different pitchers that are all in the Hall of Fame. The red line is Nolan Ryan, a very long career, about a 27-year career. The green line, the middle line, that's Hoyt Wilhelm. For some of you younger folks, you might not know who Hoyt Wilhelm; he pitched starting in the early 50s through 72, I believe. Fairly long career; spanned multiple eras. He was mostly a reliever but not a reliever like you might know of the relievers today. He's a guy who when he went out to relieve, yYou know, he might pitch six innings. Okay, so very, very atypical from the relievers today. The blue line is Trevor Hoffman, great closer for the San Diego Padres. But again, very different pitcher. So question is, I mean, what do we do, how do we get this data ready and set up in such a way where we can compare all three of these people equally? So first thing I mentioned is we want to clean up your data. And by the way, I'm going to use four different metrics. I'm going to use WHIP (walks and hits per innings pitched), strikeouts per nine, home runs per nine and a metric I've easily created called percent batters faced over the minimum, where I've just taken the number of batters a pitcher's faced divided by the total outs that they've gotten and subtracted one. The idea here is that if every batter that was faced made it out, then that would be a perfect one. Okay, I'm going to look at those four metrics. I've got different criteria in terms of how I define my normalization, in terms of how I am screening outliers and I'm going to include a PowerPoint deck for you to look at to get the details, but I'm not going to talk about them here for the sake of time. So first thing I'm gonna do is going to clean up the data. So you'll notice that, for example, that very first year Nolan Ryan pitched three innings pitched; very, very high WHIP. As a couple of seasons in here, I think that Trevor Hoffman pitched a low amount. So, so I'm going to start by excluding the data. That's nice. It's shrunk the range and it's always good to get out, get the outliers out of the data before you do the analysis. One other step that that I want to mention is that when I did FDE, when I used FDE on this data within the platform, it allows you to do some additional outlier screening where, even if you have multiple of columns that you're using, you only are screen...you're not screening out the entire row; you're only screening out the values for that given input, which is a very, very nice feature. So I use that as well because there were still, even with the my initial screening, there was still a few anomalies that I needed to get rid of. clean the data. Normalize it is the second. So by normalization, what I've done is, I basically normalized on the X axis. And I've normalized on the Y axis. So, what we're looking at here is the number of seasons. So each one of these seasons is taken as a separate whole entity, but we all know that in some seasons, some pitchers throw more innings than other seasons. So rather than looking at seasons as my entities, I'm going to look at the cumulative percent of career outs. So I know that, I know that at the end of the season pitchers made so many cumulative career outs, and that's a certain proportion out there, whole or total career outs. So I'm going to use that to scale my data. Now the great thing about that is, you'll notice that now all three pitchers are on the same x scale. Everything, everything is scaled from zero to one. So, so, really nice... from the standpoint of FDE analysis, a really nice thing to have. And then finally, I want to scale on the Y axis as well. And all I've done is I've divided the WHIP by the average WHIP for the pitcher type and for the era that they pitched in. So I have a relative WHIP. Now the other nice thing about about using these relative values is that I know where my line in the sand is. I know that a pitcher that has a relative WHIP of one is is an average pitcher. So in this case, I'm going to be looking for those guys that throw with WHIPs under one. So you'll notice that all three of these pitchers for the most of their career, they were under that that line at one. Now the final thing I'm going to do, is I want to use my FDE to model that trajectory, the trajectory. Now, one of the problems with using the data as is, the two problems with using the data, as is. One is that it's pretty bumpy, and it would be really hard to to estimate what the career trajectory is with all of these ups and downs. Second thing is, eventually what I want to do is, I want to use that metric that I've generated from the FDE, this trajectory to come up with some overall career estimate. So rather than looking at my seasons or at my cumulative percent as discrete entities, I want to be able to model that over entire continuous career. And we'll see that a little bit later on. So I am going to replace my percent my...I'm sorry about...conditional...my WHIP, my my relative WHIP with this conditional FDE estimate. Now, you might have seen me flip back to those two, you might say, oh boy, that is what a...what a what a huge difference between the two, is that really doing a good job? Kind of hard to tell from that graph. So, so what I, what I want to do is I'm going to actually show you what that looks like. So here what I've done is I pulled up the, the, the, the discrete values. This is Nolan Ryan, by the way. The discrete measurements for Nolan Ryan, along with his curve for his for his conditional FDE estimate, you'll see that it doesn't follow the same jagged path or bumpy path, but it does a good job estimating what his career trajectory is. And in general with his WHIP high, at first, he walked a lot of people, was a very, very wild pitcher, much more wild in the beginning part of his career, believe it or not. But as his career went on, that got better. And this is you'll see this in in any of the pitchers that I that I picked. So for example, if I go to, let's go to Hoyt Wilhelm. Here's Wilhelm. Again it doesn't capture the absolute highs and lows, but it does a good job at modeling the general direction of where, of where his career went. Okay, so let's let's use that to ask. I only have a limited amount of time. I wish I had more time because there's just some neat things I can show you. But I'm...I'm going to start with what I call the snubbed. Okay so these are the players that...so I used FDE on those four metrics I'd mentioned. I use those as inputs, along with the pitcher type and I tried a whole bunch of predictive modeling techniques. The two that that worked the best for me were naive Bayes and discriminate analysis. And I use those two modeling techniques to tell me who got in...who should be in and and and and who shouldn't be in and and that's what...what we're looking at here is, we're looking at those pitchers where both the naive Bayes and the discriminate analysis said yes, but the Hall of Fame said no. So these are my...this snubbed. So you'll notice that in this case...and let me switch to this. This is the apps. This is the relative WHIP. Let's go with the conditional WHIP. And let me go ahead and put that reference line back in there at one and you'll see, for the most part, these are pitchers, who spent the top...the bulk of their career under that one line. Now the other thing that you might might think of, looking at this data, is that wow, it would be really hard to tell these players apart. How do I compare these now, if I if I were to put, let's say, a few pitchers that were in the hall in this list, too. I mean, they would be...it'd be hard to separate them just by eyeballing them, because some of their career, they would be better than others, and they would switch on other parts of their careers. How do I, how do I deal with this on a career level? So as I mentioned earlier, one of the nice things about functional data explorer is that I can take that data, and I can I can I can create a career trajectory. Estimate a whole bunch of data points along that career trajectory. And I did that I actually broke up careers into 100 units and I summed over all those hundred units for each one of my curves. So basically, what I did is I got something like an area under the curve. If it were above that one line, I'd subtract, if it were below that one line, I would...I'm sorry...if I were above that line, I would add; below the line I would subtract. And if we look at total career trajectories...this is a, this is actually...this is a larger list. This is approximately 1300 or 1400 pitchers, so absolutely everyone who was... absolutely everyone who was Hall eligible, 10 years or more. So let's let's really quickly go into a couple of things we can do with this. Let's start...let me start out by by looking at the players that were snubbed. So these are...these are our player...this these are my players that were snubbed. So okay, so these are 100 values. So, so, so the line in the sand here would be 100 because I've got 100 different values I've measured. So you'll notice that for the most part, these players were above 100. Here's the list of, of the, of the players that didn't make the list. And if you take a look at these players, you'll notice that there are a couple of guys in here that are obvious. People like Curt Schilling and the and the and the Roger Clemens for non non non career reasons for the for the for the for the for the...that some of the other criteria that Christian mentioned,are not in there. But there's some guys, for example, Clayton Kershaw who's still not done with his career. But there certainly are other people who you might consider..that are Hall eligible. So let's actually, let's look at that, too. So let's look at those folks who are who are Hall eligible, who have not been in the hall... BJ Ryan; again Curt Schilling is in there; Johan Santana, not sure why he didn't make it in the hall; Smokey Joe Wood, pitcher from the early part of the 1900s; and so on. So, the ability for FDE to allow me to extract values from anywhere along their career trajectory is is is an extra tool for me to be able to estimate some additional criteria, in terms of who belongs in the hall and who doesn't belong in the hall. So, enough said about the pitchers. Let's...I'm gonna turn it back to Christian so we can talk a little bit more about the position players. Christian Excellent. Thank you, Don. Right. Okay so Don was talking about the pitching...the pitchers and so I'm looking, I'll be looking at the position donmccormack players, and so there's two different components that go into that. Christian You have your, your batting prowess, as well as your fielding prowess and I took a little different take than than Don did, with respect to just looking at the statistics and then building models. I ended up starting off with just four of the more common batting statistics, and those are the first four on the list here, some of what you'd find the backs of baseball cards. And then as I was progressing, as we'll see, I needed something to capture stolen bases, because the first four don't really...don't do that at all. And so I created a metric I call the base unit average that brings into other base runner movements that...to give credit to the batter for those things. And then the fielding, of course, is a factor as well as we'll see, so I included a couple of metrics for fielding. And so like Don like just mentioned earlier, I wanted to make sure I compared apples to apples, so I'm looking at with reference to position and league and year for this those statistics I mentioned. And then when like Don, I wanted to make sure I I weighted those smaller sample sizes appropriately so they weren't gumming up the system. And so I ended up weighting players' performance relative to the number of plate appearances relative to kind of the average for that league year at a particular lineup slot on how many plate appearances that slot should get over the course of the season. So that's how they're weighted. Right. So let's, let's see what that looks like. We're going to go back and visit Ted Williams again here. So we've got Tim Williams' career on the left here, we saw, and these are the raw scores. And then it looks like he had a really poor season here. But if, once you take a relative component of that, you can see it's actually an average season like Don, it's still above that average line of one. And so it was just a kind of a poor season for Ted Williams on his own standards. And then we saw earlier that these two peaks for Ted Williams might have resembled were his peak performance, but it turns out that those are seasons where he had smaller numbers of samples...of played appearances due to his being...going off to the Korean War. So he ended up having that impact his scores. I weighted accordingly back toward the average again because of the smaller sample sizes. So, that's how we, the types of data. I'm going to focus on just the relative statistics in my conversation here and just focus on some of the things that caught my eye. There we go. And we'll do that. Need the table of numbers here to feed in from the FDE. So here's the scores that we're going to be looking at, the relative FPC scores from the FDE. And what the first thing I saw, I included a four variables in my model, those first four batting statistics, and I wanted to just make sure I had the right components in my, in my analysis. So on the left axis here is the model-driven probability of being in the Hall of Fame. Now what...excuse me, that's the y axis, on the x axis is whether or not the person actually is in the Hall of Fame. And so my misclassification areas are these two sections here. And I noted that there were some players down here more than I was kind of expecting. So I was exploring and we might explore variables that I didn't yet include, like stolen bases. And so I'll pop those in for color and size. And as you can see, it seemed pretty clear to me that stolen bases is definitely a factor that the Hall of Fame voters were taking into account. These are... so the color and size are relative to the number of stolen bases over their career. And this is what drove me to create that base unit average statistic that I then used. So adding in...as I was exploring those models I as I described, I started off with four statistics and then added in that BUA statistic. This is my x axis now. And then I added in fielding statistics and we what we have here is a parallel plot, where the y axis is again...is a probability of the model suggesting the player should be in the Hall of Fame and each of the lines now is a player. And so the color represents their Hall of Fame status. Red is yes, there were already admitted, and blue is no. And so I like this plot because it allows me to look at to see who's moving. If I can see the impact of those additional variables in the model. And of course the first thing that caught my eye was this guy here, that how it popped up from being a not really to adding the stolen base component, and we can see that he's a high probability being elected to Hall of Fame and so belonging, depending on how you look at it. And it's Ricky Henderson, who happens to be the career leader of stolen bases. Now another player, and just looking at the defensive side of things, is Kirby Puckett, whose initial statistics suggest that, based on the initial model, that he makes it; he's qualified sufficiently just across the line. But then, you know, back if you add in the stolen base component, yeah, he actually doesn't seem to qualify any longer. And then finally, we put in the fact that he's, he's a really good fielder, he won a number of golden gloves playing the center field for the twins, we see that he's back in the good graces of the Hall of Fame committee, and rightfully voted in. This is kind of a messy model. Not messy model and you did a lot of stuff going on here. So, I ended up adding in my local data filter so I could kind of look at each position individually. And here for first base, it's a lot easier to see that we have the, the folks in red and then in blue. Now we've got somebody here, this is Todd Helton who, at least in all the models that we were looking at suggest that he should be admitted to Hall of Fame and he's still eligible. So he's still waiting to the call. But someone like Dick Allen, there's also blue, not in. His numbers, at least based on the the summary stats, the FDE statistics that we're using and the models suggest he shouldn't...he belongs in the Hall. And there are other folks who are red, down in the bottom, like Jim Thome, who the models suggest he doesn't really belong, but he was voted in. So, different ways of exploring those different relationships among, as we add in those predictors. Now, like Don, I wanted to get a sense of, well, who's, who was snubbed and who might have been gifted or at least had, you know, non statistically oriented components to his consideration. And so I, like Don, running a number of models and settled on four models that I was, I liked and did the best job of... predictive job, and like Don, rather than just using age in my FDE as the x axis, I also based it on a cumulative percent played appearances. And so that would...having these two different variants gave me a number of models to look at. And so I drilled down to just the folks who across all eight models, are in the Hall of Fame, but none of the models suggest they should be. And that's this line here. There's 31 of those. And the reverse side I have in green here, the folks who the models in either of the buckets...the majority the models in either bucket of age versus Kimball diff percent of plate appearances suggest they do belong in the Hall of Fame, but they're not. So I pulled all these folks out and just like Don wanted to, just compare what what are their trajectories look like and is there...are they close at least, or is there something else going on here? And so you can see from the this is the on on base percentage plus slugging percentage, OPS, again. It certainly looks like, in red and the plus signs, that the folks who were snubbed performed a lot better on this metric, and as it turns out, every other offensive stat metric better than the gifted folks, the folks who are in, but the model suggests shouldn't be. And that made me think, Well, is it, is it just the offensive stats that are and maybe the fielding is where the, the, the folks who were in already shine? And based on what at least fielding percentage, it actually suggests that there that still is the case, where... actually this is this snubbed folks. The, the gifted folks still look like they were... they don't necessarily belong as much as the these snubbed folks do. It was only on the range factor component where the tide reversed. And so you end up seeing the gifted folks outweigh the snubbed folks who performed better. That's another different take, much like Don's, that you can use to evaluate just what the components are included in your model. A lot of different ways we can look at the data here. So just wrapping up because I'm sure some of you are just burning to know who is snubbed and who is gifted among those folks. These are some of the folks that were snubbed, at least among the position players and, like Don mentioned for some of his pitchers, there's a few of these folks who are banned from baseball, so they're not exactly snubbed, so. you probably recognize some of these. And then these are some of the players who were gifted, or at least it the criteria of their statistics alone is...it may not have been what got them in the Hall of Fame. Right, so just wrapping up where we've been, we've been able to take those player career trajectories of their performance on...pick a metric and put that into the functional data explorer and get out numerical summaries that capture the essence of those curves. And then, in turn, use those statistics those scores that we get to be able to put those in our traditional statistic techniques that we're familiar with. And so now we can change that question from how you model or quantify career trajectory and revise it to a question of what do I want to explore with these FPC scores I've got? So we hope you enjoyed talking about baseball and just that interaction to baseball and JMP and FDE. And hope you feel empowered to go and take the FDE tool that's available in JMP Pro to address questions with data like who your favorite player is and why, and have the means of backing it up. Thanks for joining us. Take care. donmccormack Okay, so how do we deal with these cases where we need to look at somebody's career trajectory? Are there other metrics where we can make these comparisons, so that we could tell these really fine gradations apart? So as I as I alluded to earlier, what we could do is we could we could certainly we could we could look at absolutely any point along the along the person's career trajectory with any amount of gradation that we want to. And I did that. I took 100 data points, 100 values between zero and one, start of the career, end of the career, and I summed up over all those values. And I did this...the nice thing about this technique is that I can do it for multiple metrics. So, so now what we're looking at here is we are looking at, we're looking at a plot of all four metrics. We can plot them all on one graph. We're going to go back again to that group of folks that were that were snubbed, these folks here. So that's so...so if we take a look at these folks, we see that they had a low...by the way, 100 in this case because there were 100 observations. hundred home runs per nine, you want that low; percent batters faced over the minimum, low; and then the strikeouts over nine innings, you want on the high side. You'll notice that that's kind of the trajectory that folks follow. Now then, the interesting thing about this point is, that what I can do is, I can use any criteria that I want to. So for example, let's say I'm going to look at...I'm going to consider all my players and I only want to consider those people who had A WHIP that was below, in this case, 100...so better than that...that's actually that's...even make it better than that. Let's say 90 or below. Okay, so let's look at those folks who, you know, at least have the average number of strikeouts per nine innings, and maybe their batters per...percent batters faced over 100 is at a minimum. And so, and I'll disregard home runs for nine here. I also, you could also standardize and normalize by the number of seasons and I've done that exactly. So what I want to do is I want to look at those players that maybe only have 10 season equivalents, where a season equivalent is based on what was the average player season like. All right. And then finally, what kind of workload they had over their, their entire career. And let's say we want somebody who had at least 80%, let's make a little bit more stricter, let's say, let's say, about the same workload. And again, we can use different criteria to weed out those folks who we don't think we should consider and those folks who we do think we consider and then using those criteria... I also want to say let's let's take a look at those folks that are not in the Hall of Fame. So here we go. Now we have a list of people who are worth considering. And you'll notice that they're they're quite a few folks folks that probably shouldn't surprise you. These are folks that are either not in the hall yet because they're still playing or just have been disregarded know, Chris Sale, for example, is still pitching. Curt Schilling, for obvious reasons is not the hall. Johan Santana, why, why isn't he in the hall? He was actually part of that group that that that were snubbed. So the nice thing about using these FDEs is that you can take them, turn them into your career trajectories, and then use an additional metric to be able to determine hall worthiness and non Hall worthiness.
... View more