Choose Language Hide Translation Bar

Predicting the Next NFL Passing Yards Leader: A Data Visualization Journey

Tom Brady is generally considered the best quarterback in NFL history. His record for most career passing yards is one of the many impressive records he has. To achieve this, he consistently maintained a high passing yards total per year, had the most passes attempted and completed, avoided season-impacting injuries, and had a long career overall.

Records are made to be broken.

The next quarterback to break this record might already be playing the game. But who? And when?

This presentation takes us on a data exploration and visualization journey to show how Tom Brady became such a dominant force in the NFL. We then identify which current quarterbacks have the greatest chance at challenging this record and when it might occur.

We cover the entire analytic workflow: a JSL script that scrapes data from a website; a data clean-up workflow that prepares the data for analysis; data exploration and visualization; and finally, a published JMP Live report that will be updated throughout the season.

 

 

Hello, I'm Scott Allan. I'm a systems engineer with JMP, and I'm pleased to be here with my colleague Yasmin Hajar to present our poster, for the Discovery Summit 2024 in Cary. Earlier this year, Yasmin and I were talking about Discovery Summit and how we wanted to present a data visualization and analysis workflow, that goes all the way from data access, to publishing in JMP Live. We work with customers every day, and we're seeing a lot of increased interest around the fully automating this workflow, going from data import to analysis to publishing those results.

For this presentation, we're going to touch on a lot of different capabilities in JMP. We'll show how to pull data in from a website, how to use Workflow Builder, to automate data updates and cleaning, how to use various JMP platforms to help visualize and explore or even model the data, and then publish those results to JMP Live. Then we're going to add a little bit extra and show how to do this in the single click of a button using a custom add-in.

For the project, we decided to use a former NFL quarterback Tom Brady as the backdrop for this workflow. At the time, we were thinking about ideas. I had read an article about how dominant his career was and that his records are going to last for a really long time. I started thinking about, how long is a long time? Is the next record-breaking quarterback already playing, and if so, when are they going to challenge those records?

Tom Brady holds many different NFL records. For the purpose of this presentation, we're just going to focus on one of them: his career passing yards. But the same workflow could be done for any of his records. Neither Yasmin nor I are really NFL superfans. We weren't sure who is in the running or who the candidates are or who they would be. As we started digging into this data, we learned a lot about what an outlier Tom Brady really is in terms of his career, and it's easy to see why he's considered one of the greatest of all times. This data exploration journey really starts with accessing that data. I want to hand it over to Yasmin to get us started.

Thank you, Scott. The very first step we had to do was create this historical data for all the stats from between 1940-2023. You can see we had to access the Pro Football Reference website. We created our own JSL script where we were replacing that sub-year variable over here by the corresponding year and concatenating all these statistics into one big raw data table for all those past years. That was our very first step.

Then we created a workflow using the Workflow Builder, where we would open first this historical data, and then we would add into it the current week statistics. Once the season 2024 starts, we will be able to use those weekly stats and add them to the historical data. You can see that there are multiple extra steps we have to do, such as standardizing some column attributes, adding a career year, adding an active year, adding a cumulative yard column, so that we can sum all these yards.

But I want to point out to the fact that some of those steps, such as the importing of the current week and historical data, have a plus sign next to them, which means it was a custom step, which we had to create in JSL and paste it into the workflow as a custom step.

Once we created this workflow, we end up with this up-to-date football stats table, with all the information up to the current week we are in. We started using those visualization tools that we have. The first graph that we created was the cumulative yards as a function of the year, with all retired quarterbacks in gray, Tom Brady in purple, and in orange the active current quarterbacks.

We can see instantaneously how there are some quarterbacks that just retired early, and had they kept going, they would have beaten Tom Brady's record, or Tom Brady wouldn't have been able to beat their record had they kept playing. On the active side, we can see some already having a strong slope in their cumulative yards. We took the active ones only, and we plotted that as a function of their career year. Normalize it to career year.

Then we started looking at which of these players are in the lead. We can see Aaron Rodgers is in the lead. However, he had a tough year last year, who knows what's going to happen this season. But on the other hand, Matthew Stafford and Russell Wilson, for example, they are already cumulating more yards compared to Tom Brady, if we compare by career year.

They already are ahead of that line, if we're just comparing at the same career year to both of these players. Derek Carr is somewhere over here. He's doing well, but he's still early in his career. We decided to plot things a little bit differently. Average yards per year, how do these quarterback look in terms of the average yards per year? We saw that Justin Herbert is really rocking his yards per year with 4,300 yards per year compared to the rest of them; even compared to Tom Brady, his way down here.

But if we look at this other graph over here where we are dynamically linked these players to see where they are in their career year. We see that Justin Herbert is still early in his career with only four years accumulated. With that, Scott is going to show you more advanced reports and models that we created.

The first step when we get new data is just to do some data exploration like Yasmin showed. Another way to start thinking about who are the candidates that might challenge these records are through some other data exploration, like clustering. What we did here was to find quarterbacks that are similar in all sorts of qualities, not just in that single statistic of cumulative yards or yards per year, but across all of their characteristics, we did hierarchical clustering.

In this case, we clustered all the quarterbacks. I think there's about 800 of them in this data table, and we clustered them with respect to their stats, like past attempts, completions, interceptions, everything. What we're looking at here is the results in a constellation plot. This constellation plot is showing the groups of quarterbacks that have similar qualities across all the statistics.

We can see there's some major branches here, that start differentiating different groups or classes of quarterbacks. We can see where Tom Brady lies in this. That doesn't tell us anything about their statistics or tell us anything about their capabilities. It just shows that these are similar to each other across all those statistics.

We can zoom in on that one branch that has Tom Brady. We can certainly see that there are some familiar names here, some names that you probably saw on the prior graph, and maybe some others that might be surprising. We also kept in the retired quarterbacks or those that aren't active any longer. Those are the little gray dots.

You can see, if they had been in a different career year, they might have had some longevity. They might have been able to challenge some of these records, but they are no longer active. Their ability to challenge those records is over. We can see where Tom Brady is and start seeing now what other quarterbacks might be poised to start chipping away or challenging the records that Tom Brady has.

The next thing we wanted to do, is start to model this data. The first two methods, there is data exploration, there is clustering, were really aimed at exploring the data, identifying the quarterbacks most likely to challenge a record. But in order to determine when those records might be challenged, we needed to build a model. There are a number of different models that you could use to do this, and we chose to build a mixed model.

We went to fit model platform with a mixed model personality, and used cumulative yards as the response or the Y variable, career yards as the X, and then we use the player name as a random effect and then nested those coefficients. Essentially what we get here is a simple linear regression of each quarterback, yards per year cumulative. Each with its own slope and intercept. From that information, we've got all of these different regression equations, so we can do an inverse prediction.

We use the inverse prediction of Tom Brady's record of 81,294 yards. What we get from that inverse prediction, is the number of years it would take for that quarterback to achieve that cumulative yards. We created a data table from that report and then normalized that data table with respect to each quarterback's rookie year. We get this graph over here on the right, which shows each quarterback. This was sorted by the number of years it would take them to achieve that record and then ascending.

You can see in this case, the two that we had identified, Aaron Rogers and Matt Stafford, are the two orange dots there. You can see where Tom Brady is in the Purple Dot, and how he did it essentially in his 24th career year, which was back in 2000. I don't remember what that was, '22 or so. But what we see here is other quarterbacks as well. Certainly Aaron Rodgers and Matt Stafford are going to be the first ones to challenge that record, but they're still going to take another 6 or 7 years to do that. Expect to see them in the 2031 or '32 seasons. They can start approaching their career yards record if they keep playing.

There's another earlier generation of quarterbacks that are going to start challenging that record even further out. Maybe in the late 2030s or early 2040s. Don't hold your breath, this is not going to happen anytime soon. But it was nice to see that no quarterbacks are eminently pressing this record, but there are a number of them that might do that in the future. Now we've explored and analyzed the data, we're going to shift gears to show how we're going to publish this data and share it and then automate the entire workflow.

I'll pass it back to Yasmin to help us with that.

Thank you, Scott. As Scott was saying, our objective here was to publish some of these reports onto JMP Live so that people can to follow with the changes in these reports as the season progresses. To do that, all we had to do is file publish, publish report to JMP Live. We selected the report that we wanted. In this case, I had a dashboard of a couple of images or a couple of graphs, and then replaced the data table on JMP Live with the one that was updated with the statistics from that pro football website.

To take things in an extra step, we consolidated all these steps, including the publishing to JMP Live, in an add-in. Here we're showing the steps to create an add-in because of how easy it is to create one, and we wanted to share with others how to do that. To do so, we click on File, New, Add-in. Under the button on which we want to press, we copy pasted the JSL script, which we exported from our Workflow Builder. Once you create the workflow from Workflow Builder, you can export the JSL script, and then we copy pasted it into here.

Taking things even one more step further, we moved that add-in rather than it being in the add-in menu. We created our own custom menu, so that it'll be just easier for folks to click that button. With that, I'm just going to do a quick show of the steps. Here's our custom menu. Clicking on the Run Stats, we're going to see the current year updating. It's going to be a standardizing table, the columns and so on.

It's going to create a subset of that table with top layers in it. It created the dashboard in here. I'm just going to show quickly the JMP Live, which is being automatically reloading, with the newest and latest stats for the average yards per year and the yards versus player. That's all we had for the publishing and the adding. I'm going to bring it back to Scott for his last comments.

That takes us through the entire workflow from data access, cleanup, graphing, modelling, and publishing. If you want to follow along, we're also going to post this dashboard into JMP Public, maybe with a couple of other graphs that we find interesting. Check back, and we'll post a link to the JMP public site when that is up, and we'll post that in the community.

If you want to learn more about anything everything that you saw here, we really had just a high-level view of what we did. We just took a quick glimpse at the scripts and at the analysis. But if you want to learn more, please post in the community. We'd be happy to answer those questions. With that, I'd like to thank JMP for giving us the opportunity to do this poster and presentation. I want to thank Yasmin for working with me on this project. I had a really great time and learned a whole lot as well.

Thank you.