Today we're going to talk about
what provides the best opportunity for a comeback in the NBA.
So my name is Weston Salmon.
I'm currently a student at Oklahoma State University
studying for Business Analytics and Data Science in our Masters program.
And my name is Zach Miller.
I'm also a student at Oklahoma State University
and also studying for my Masters in Business A nalytics and Data Science.
All right, so we're going to cover the table contents real quick.
that shows what we're doing throughout the presentation.
First we're going to begin with an introduction
that discusses why we're here and what exactly the study is for.
Then we're going to jump into our data and methods,
and that looks at what does the data look like,
how did we complete our analysis,
and the different ways we manipulated our data.
We'll then look at the descriptive and predictive proportions
that show what can we derive from the data as it sits
and then what do our predictions look like.
Then at the end,
we'll conclude the presentation
and look at what are the implications of the analysis
and how it can be used by NBA teams in the future.
Here's a quote by Gabe Frank.
He's the Director of Basketball Analytics with the San Antonio Spurs.
We thought this would be a good quote
to throw in as it deals with [inaudible 00:01:16]
and also the NBA is in general.
So he said,
"I think analytics have grown in popularity
because it can give you a competitive advantage
if you do it well.
Every little bit helps."
Through our presentation, we're going to discuss
NBA analytics and how they can produce come backs
based on the data that we find.
We thought this quote really spoke to the overall objective of our project.
Now I'll pass it off to Zach to i ntroduce the project as a whole.
Thanks, Weston.
Going into the introduction here, going into a NBA season,
every team has one common goal, and that's to win the championship.
Like I said, going into the season, a lot of teams hope for 40 plus wins
in their 82- game season,
but a great season typically results in 50 plus wins.
Then also our primary interest for this presentation and analysis
is in the hard fought victories or also known as a comeback victory
by these different teams.
We define a comeback victory
as the winning team losing by 10 plus points at halftime.
At halftime being down by more than 10 points by the end of the game,
we were seeing that every so often
there were teams that were taking a lead and ultimately winning this game.
Then finally,
we have our analysis that utilizes play- by- play data and salary data sets,
which we will go into a little more detail in just a few minutes.
Now we're going to discuss the two business questions
that we want to answer using both the data.
First, looking at the play- by- play data,
we want to know exactly what plays or sequence of plays within the game
give the team the greatest chance at a comeback victory.
We want to know what exactly players and coaches can do
and draw up in the line up to produce a comeback.
Then with the salary and career stats data
we really want to see how those variables,
the salary and career stats,
can be used to determine how involved a player should be in comeback victories.
So not necessarily just how well they perform as they actually perform,
but also how they should perform based on these variables.
So which players are underperforming
and over performing according to their contract and track record?
Next we'll discuss the data and methods that we used for both the data sets.
Like I said, we ha d two data sets.
The first one focused on play-by-play data.
This data contains the process and outcome of every play
within every game from 2015 through 2018.
So it included all 30 NBA teams and exactly what they did
in every single play throughout the games throughout these years.
Zach will also talk about the salary data that we have.
Right, so going into the salary data, it contains the salary information
for each of the players
that were mentioned in the play-by-play data.
Whether that player that was mentioned in the play-by-play data
played one minute of NBA time or hundreds of minutes of NBA time,
they were appearing in my salary data.
I could then go and see what their career stats and what their salary was
for the seasons that we were looking at within the play-by-play data.
Now we're going to look at the key variables
that we found within the play-by-play data set.
You can see things such as comeback,
half time deficit, the shot distance, outcome type, and rebound type.
But two that we want to focus in in particular was the comeback.
That's a variable [inaudible 00:04:51]
it was our flag variable and used as our predictor.
We flagged the one next to all games
where a team trailed by 10 or more points at halftime
and came back and won and it was a zero if not,
because as we said,
the overall goal of this presentation
is to see what leads to come back in general.
Then also we want to look at the halftime deficit,
which was another variable that we created.
This shows the number of points, the certain team trailed by at halftime.
If the deficit was greater than or equal to 10 points,
then those were the games that we specifically looked at,
and then we want to see if the plays made throughout those games
led to a comeback in the end.
Now looking at some of the key variables for the salary data.
These as a whole are variables that we decided
were going to be important for our analysis.
But once again,
I wanted to focus on a few of these or a couple of these variables
as I feel that they are more important to point out and explain.
First of which being the player involvement variable.
The player involvement variables account of individual involvement
on key plays during comebacks.
These plays could include shots, rebounds, fouls,
any of the actionable plays
that we see throughout the play-by-play data.
So we wanted to take individual accounts
from players so we could see
how many times a certain player
was shooting the ball throughout the seasons,
and really be able to compare these players
to other players within the league
Then going to come back score.
This is a min- max scoring method
that we use to score the overall player involvements.
This is what we use to really quantify how involved these players were.
This is utilizing the player involvement variable
that you see that I just explained.
I wanted to go a little bit deeper into the comeback score calculation
just to make sure that everyone understands how this was calculated.
As I said, it was a min- max scoring method,
and this was used to determine the involvement of the players
during their team's comeback victories.
This min- max method creates the scores,
taking the players involvement into account,
relative to the range of values that appear for each variable.
It would take the maximum count of these different plays
and it would use that as the maximum and then a minimum
of typically what we found would be zero as certain players would only play
a very low amount of minutes from zero and all the way up to hundreds of minutes.
But typically with the zero minute players,
we found that they did not contribute much to these comeback wins.
Below you see the formula that we used for each of the players
to create this comeback score.
This is a perfect example as we see the assist count
divided by the maximum assist times . 1667,
which .1667 being 1 divided by the total of the 6 included variables,
which is what we would call the weight for the formula.
Each of these variables was weighted equally
and we took the min- max score for each of the variables
and multiplied that by 100 to get the final score.
We'll now look at the play- by- play analysis method.
When taking this data, we first began by merging.
We had six CSV files,
one that identified each individual year,
and we combined all those into one central file
so we could look at each play-by-play data from the six years that we had.
We then transformed the data using flag variables.
As we said,
we created a column that specified whether there was a comeback or not.
We first looked at the halftime scores
and saw if teams were trailing by 10 or more at halftime.
We would then take those games
and then see if a comeback actually occurred.
If it did, we flagged one and specifically we'll get those plays that occurred.
Then for the descriptive analysis,
we looked at different graphs within Tableau.
These included things as how far away the players were shooting from the basket,
and whether they were missing or faking their shots,
the rebound types, a nd things like that
to get an idea of what players were doing during the games,
if they were actually producing good outcome to secure a comeback.
Then lastly, the predictive analysis we did in J MP Pro using a decision tree
to see which plays and sequences of plays produce the come back
and how we can better look at those in the future to then have teams
be able to produce more comebacks throughout the season.
Now for the methods with the salary analysis.
First off, we had to do some table joins.
These joints were necessary to get all of the data tables together
as we needed them all together
to really be able to dive into everything as a whole
Separated it wasn't too much help for us.
Then we've moved on to some data transformation.
We wrote SQL queries to gather the counts of the key metrics.
This is how we got the counts of shots
for the various players
along with other things such as rebalance or fouls.
Then we moved on to some descriptive analysis that was completed in Tableau.
With this descriptive analysis,
one of the key things that we were looking at
was the comeback scores, the actual comeback scores,
and the predicted comeback scores versus the salary of the players.
So we could see just how well they're performing
relative to their salary.
Then finally we had a predictive analysis.
We did a linear regression that was completed in JMP Pro.
I will go into a little bit more detail about that a little bit later on.
Now we're going to jump
into those descriptive and predictive analysis that we conducted.
We're going to begin with the descriptive analysis first.
Here, we want to look at the salary versus comebacks by each NBA team.
If you look at the data points,
you can see that most teams follow the trend line,
meaning that as they spend more money on their teams and salary,
they also produce a greater number of comebacks.
So you can see that the Boston Celtics
had the most comebacks at 14 towards the top,
and then the Cleveland Cavaliers have the highest salary paid,
but also one of the fewest comebacks with only five comebacks.
What we thought was the most interesting was the Indiana Pacers,
because not only did they pay such a low salary,
but they were also able to produce 12 come backs
which is the third most throughout the NBA.
I wanted to hone in on the Indiana Pacers
and see what exactly they were doing
that allowed them to produce such a high number of comebacks
with such a low salary rate.
As Weston said, we wanted to focus on the Indiana Pacers.
Here we see the salary of Pacers' players versus their individual comeback scores.
Several highly scored players are found within the Indiana Pacers roster,
as you can see with Myles Turner, Carlson, Young, George, and Oladipo.
The top scored players are spread across the salary spectrum.
So you see some cheap players such as Myles Turner or Carlson
being more of a mid- range player, salary paid player.
Then you also have more expensive players such as Paul George
or Victor Oladipo further towards the top right of the graph there.
So you can really see how they've spread the wealth out across
and are getting maximum performance out of their highly paid players,
but also finding performance out of lower paid players.
You can also see that they have several middle tier players that come into play
and provide big help to the Pacers
as they need some players to come off the bench
and be able to provide
some key value plays and produce comebacks.
As I said, one of the key points that I want to point out
was that picture Victor Oladipo— the highest paid player on the team—
is also the highest performing in terms of comeback score,
so they're definitely getting their worth out of him as a player.
All right, so now we're going to get into our predictive analysis.
To begin with the play-by-play data, we decided to make a decision tree
to predict the play type that leads Indiana Pacers producing a come back
using the following variables that you can see below.
We'll see that only a couple of these variables
actually played a huge impact in predicting whether the Pacers
will come back from 10 or more points.
In the decision tree,
there are two nodes in particular.
One where the distance shot was greater than or equal to 26 feet
from the basket, and they were making those,
as well as having a shot distance of greater than or equal to 3 feet
from the basket,
meaning that they're looking at more of a layout option.
Now we're going to look at those two nodes a little bit more in particular.
These branches, as I said,
predict that the Pacers produce comeback victories.
In the overall model we had a validation misclassification rate of 45.97%.
As I said,
the model predicts that made shots of 26 feet and further
made lay ups 3 feet or further from the basket leads to comebacks.
We would say is,
they should really focus on the three-point aspect
and more higher percentage shootings such as lay ups,
because as you can see in both of those, the prediction was one,
which in this case means that the Pacers were able to produce a comeback.
You can see that with the 26 feet and further node,
the probability that it equalled 1 was 62.75%,
Then when we were shooting lay ups 3 feet or further from the basket.
you had a probability that you would win of 75.9%,
or come back at 75.9%.
Then as I said,
there were two variables that seem the most important
of the 10 that we looked at
in predicting why the Pacers were able to produce a comeback
that was first shot distance.
Which looked at how far players shot the ball,
and then also shot outcome.
That's whether they made or missed the shot.
With the distance, as I said,
26 feet or further, which is about the three- point range,
or some of those higher percentage shots in the play for a lay up.
Then also if you're making more shots,
you're producing a higher score giving you a better chance of coming back.
All right, so now moving into
our linear regression portion of our predictive analysis.
This regression, as I said, was completed in JMP Pro,
and this was done to predict the comeback scores of individual players
based on the following variables that you see there on screen.
A couple of the key ones to point out would be their individual player salaries,
their team name, and then their career statistics,
as you see with all those different variables there.
It's also important to note
that the variables were selected for this regression
based on their level of significance.
If the variable was not found to be significant,
it was not included in the regression.
Going into the summary of fit for this linear regression,
I do want to point out that it does have a low RSquare,
but this is not a primary concern for our analysis.
We knew that the comeback score
would be based on the comeback involvement statistic,
but we now wanted to know
what the score would be based on completely different variables.
So instead of using the variables
that we used to create the statistic initially,
we're now using new variables to try to predict what it should be based on,
like I said, their salary and career stats.
That means that the predictions would vary from the original scores
and that was not only expected in our analysis,
but it was also desired that we came up with different scores
to really see how they were supposed to perform.
Now, based on this analysis,
we were able to come up with some of the most important variables.
The first of which that we saw was most important was salary.
Something that we were seeing
is that higher paid players were predicted to perform more,
which is something that you would definitely see more
in the actual NBA.
Seeing that players like Victor Oladipo or LeBron James
with higher salaries paid to them would be performing better
than those with lower salaries.
Then moving on from there, we also have the team.
This one definitely makes sense
as you see that some of the top teams
that it was looking at for a comeback victory
and predicting the comeback scores
is the Golden State Warriors and the Indiana Pacers,
which is a couple of the teams that we saw
had the highest number of comeback victories
over the seasons that we were looking at.
Then we also had a couple of career stats that really popped up
and showed to me a couple of the most important variables
for this regression.
The first of which being the career total rebounds by the players,
and then that was followed by the career points.
Seeing that player had higher career-total rebounds and higher points,
we expected those players to produce more
value whenever it came to creating comeback victory.
I'll also note that these important variables
were calculated through the log worth.
Now we're going to look at the conclusions of the presentation.
Okay, so going into some of the Indiana Pacers predictions,
specifically want to point out some Pacers' top performers
and under performers.
The blue dots that you see there are the actual Pacers' top performers
that we saw in the earlier graphs
of the actual predicted comeback score versus the salary,
whereas now we are looking at the predicted score
or their actual comeback score, sorry, versus the salary,
and now we are looking at the predicted score.
The the orange Xs mark the Pacers' underperformers.
The underperformers in this graph with the orange Xs,
we are seeing them predicted to be relatively much higher
than their teammates, whereas with their actual scores,
they are finding themselves more middle-to-lower-end of the pack
relative to their teammates,
which really shows us that they're not performing
up to what their salary and career statistics
say that they should be performing
particularly when it comes to creating a comeback victory.
But it is important to point out
that the team has done a great job of signing inexpensive players
that produce comeback wins.
We see those players such as Myles Turner or Carlson, or Young
that have a little bit lower salaries,
but they also produce a lot of plays
that can help with creating a comeback win.
Then we also wanted to point out
some of the Cleveland Cavaliers predictions
and their faults that go with them.
The Cavs should have multiple high- tier comeback players.
One specifically to point out would be Kevin Love.
Kevin Love is there at the top of the graph
and he has both an orange X and a blue mark next to his name.
That just marks that he was one of the actual top performers
for the Cavaliers,
but at the same time, he's under performing greatly.
So in our predictions,
we can see that he's predicted to actually perform better than LeBron James,
which is something that is very interesting to point out.
Like I said, with our predictions based on salary and their career stats,
we would expect Kevin Love to outperform LeBron James
when it came to producing comeback wins.
But in reality, he's actually quite far down the list
and he still remains one of the top performers,
but he does not produce nearly as much as LeBron James does.
Now we also wanted to look at the best valued players in NBA.
We show the top five here.
Looking at their predicted comeback scores,
you have people such as Karl-Anthony Towns, Joel Embiid,
and Ben Simmons
who were predicted to be some of the higher performing players
in the entire league.
But as you can also see, when this data was taken,
they had relatively low salaries compared to other players.
What we recommend here
is definitely giving these players the contracts that they are deserving of,
as they help teams produce comebacks and obviously
provide statistics that allow teams to perform their best.
They're definitely doing more for what they're actually worth.
Then we also wanted to look at the best line of predictions
for the Pacers.
As I mentioned earlier,
that three point in high percentage emphasis.
So build up the lineup of shooting threats from distance.
You have people such as Robinson, Joseph, and George
their average shot distance is about 15 feet and beyond
when the three- point line is about 25 feet,
so that shows that they are shooting a lot of threes,
but they're also making it.
Not only are they shooting from that far,
but they're also more likely to make their shots,
so those people would be good to have in the line up
whenever you are trying to produce a comeback
as they're more efficient.
Also because they can shoot from deep,
you'd expect that they also have a solid play down low
to be able to get a lay up real quick
and get those higher percentage shots go in as well.
As I mentioned,
an average distance of made shots near the three-point line is
very important for the Pacers in particular
to be able to produce a high number of comebacks.
This analysis confirms what is already going on in the NBA.
Typically, teams who find themselves down by a certain number at halftime
will throw up a little bit more three- point shots,
but also they don't really focus on that high percentage look
just from down low into the basket.
We also think that they should focus on drawing up plays,
allow them to just get a quick lay up and build momentum upon that
as they try to produce a comeback later on.
All right, so that wraps up our presentation.
We just want to say a quick thank you,
and this is where we would open it up to questions.