Evaluating Which Plays and Players Provide the Best Opportunity for a Comeback i...

Today we're going to talk about

what provides the best opportunity for a comeback in the NBA.

So my name is Weston Salmon.

I'm currently a student at Oklahoma State University

studying for Business Analytics and Data Science in our Masters program.

And my name is Zach Miller.

I'm also a student at Oklahoma State University

and also studying for my Masters in Business A nalytics and Data Science.

All right, so we're going to cover the table contents real quick.

that shows what we're doing throughout the presentation.

First we're going to begin with an introduction

that discusses why we're here and what exactly the study is for.

Then we're going to jump into our data and methods,

and that looks at what does the data look like,

how did we complete our analysis,

and the different ways we manipulated our data.

We'll then look at the descriptive and predictive proportions

that show what can we derive from the data as it sits

and then what do our predictions look like.

Then at the end,

we'll conclude the presentation

and look at what are the implications of the analysis

and how it can be used by NBA teams in the future.

Here's a quote by Gabe Frank.

He's the Director of Basketball Analytics with the San Antonio Spurs.

We thought this would be a good quote

to throw in as it deals with [inaudible 00:01:16]

and also the NBA is in general.

So he said,

"I think analytics have grown in popularity

because it can give you a competitive advantage

if you do it well.

Every little bit helps."

Through our presentation, we're going to discuss

NBA analytics and how they can produce come backs

based on the data that we find.

We thought this quote really spoke to the overall objective of our project.

Now I'll pass it off to Zach to i ntroduce the project as a whole.

Thanks, Weston.

Going into the introduction here, going into a NBA season,

every team has one common goal, and that's to win the championship.

Like I said, going into the season, a lot of teams hope for 40 plus wins

in their 82- game season,

but a great season typically results in 50 plus wins.

Then also our primary interest for this presentation and analysis

is in the hard fought victories or also known as a comeback victory

by these different teams.

We define a comeback victory

as the winning team losing by 10 plus points at halftime.

At halftime being down by more than 10 points by the end of the game,

we were seeing that every so often

there were teams that were taking a lead and ultimately winning this game.

Then finally,

we have our analysis that utilizes play- by- play data and salary data sets,

which we will go into a little more detail in just a few minutes.

Now we're going to discuss the two business questions

that we want to answer using both the data.

First, looking at the play- by- play data,

we want to know exactly what plays or sequence of plays within the game

give the team the greatest chance at a comeback victory.

We want to know what exactly players and coaches can do

and draw up in the line up to produce a comeback.

Then with the salary and career stats data

we really want to see how those variables,

the salary and career stats,

can be used to determine how involved a player should be in comeback victories.

So not necessarily just how well they perform as they actually perform,

but also how they should perform based on these variables.

So which players are underperforming

and over performing according to their contract and track record?

Next we'll discuss the data and methods that we used for both the data sets.

Like I said, we ha d two data sets.

The first one focused on play-by-play data.

This data contains the process and outcome of every play

within every game from 2015 through 2018.

So it included all 30 NBA teams and exactly what they did

in every single play throughout the games throughout these years.

Zach will also talk about the salary data that we have.

Right, so going into the salary data, it contains the salary information

for each of the players

that were mentioned in the play-by-play data.

Whether that player that was mentioned in the play-by-play data

played one minute of NBA time or hundreds of minutes of NBA time,

they were appearing in my salary data.

I could then go and see what their career stats and what their salary was

for the seasons that we were looking at within the play-by-play data.

Now we're going to look at the key variables

that we found within the play-by-play data set.

You can see things such as comeback,

half time deficit, the shot distance, outcome type, and rebound type.

But two that we want to focus in in particular was the comeback.

That's a variable [inaudible 00:04:51]

it was our flag variable and used as our predictor.

We flagged the one next to all games

where a team trailed by 10 or more points at halftime

and came back and won and it was a zero if not,

because as we said,

the overall goal of this presentation

is to see what leads to come back in general.

Then also we want to look at the halftime deficit,

which was another variable that we created.

This shows the number of points, the certain team trailed by at halftime.

If the deficit was greater than or equal to 10 points,

then those were the games that we specifically looked at,

and then we want to see if the plays made throughout those games

led to a comeback in the end.

Now looking at some of the key variables for the salary data.

These as a whole are variables that we decided

were going to be important for our analysis.

But once again,

I wanted to focus on a few of these or a couple of these variables

as I feel that they are more important to point out and explain.

First of which being the player involvement variable.

The player involvement variables account of individual involvement

on key plays during comebacks.

These plays could include shots, rebounds, fouls,

any of the actionable plays

that we see throughout the play-by-play data.

So we wanted to take individual accounts

from players so we could see

how many times a certain player

was shooting the ball throughout the seasons,

and really be able to compare these players

to other players within the league

Then going to come back score.

This is a min- max scoring method

that we use to score the overall player involvements.

This is what we use to really quantify how involved these players were.

This is utilizing the player involvement variable

that you see that I just explained.

I wanted to go a little bit deeper into the comeback score calculation

just to make sure that everyone understands how this was calculated.

As I said, it was a min- max scoring method,

and this was used to determine the involvement of the players

during their team's comeback victories.

This min- max method creates the scores,

taking the players involvement into account,

relative to the range of values that appear for each variable.

It would take the maximum count of these different plays

and it would use that as the maximum and then a minimum

of typically what we found would be zero as certain players would only play

a very low amount of minutes from zero and all the way up to hundreds of minutes.

But typically with the zero minute players,

we found that they did not contribute much to these comeback wins.

Below you see the formula that we used for each of the players

to create this comeback score.

This is a perfect example as we see the assist count

divided by the maximum assist times . 1667,

which .1667 being 1 divided by the total of the 6 included variables,

which is what we would call the weight for the formula.

Each of these variables was weighted equally

and we took the min- max score for each of the variables

and multiplied that by 100 to get the final score.

We'll now look at the play- by- play analysis method.

When taking this data, we first began by merging.

We had six CSV files,

one that identified each individual year,

and we combined all those into one central file

so we could look at each play-by-play data from the six years that we had.

We then transformed the data using flag variables.

As we said,

we created a column that specified whether there was a comeback or not.

We first looked at the halftime scores

and saw if teams were trailing by 10 or more at halftime.

We would then take those games

and then see if a comeback actually occurred.

If it did, we flagged one and specifically we'll get those plays that occurred.

Then for the descriptive analysis,

we looked at different graphs within Tableau.

These included things as how far away the players were shooting from the basket,

and whether they were missing or faking their shots,

the rebound types, a nd things like that

to get an idea of what players were doing during the games,

if they were actually producing good outcome to secure a comeback.

Then lastly, the predictive analysis we did in J MP Pro using a decision tree

to see which plays and sequences of plays produce the come back

and how we can better look at those in the future to then have teams

be able to produce more comebacks throughout the season.

Now for the methods with the salary analysis.

First off, we had to do some table joins.

These joints were necessary to get all of the data tables together

as we needed them all together

to really be able to dive into everything as a whole

Separated it wasn't too much help for us.

Then we've moved on to some data transformation.

We wrote SQL queries to gather the counts of the key metrics.

This is how we got the counts of shots

for the various players

along with other things such as rebalance or fouls.

Then we moved on to some descriptive analysis that was completed in Tableau.

With this descriptive analysis,

one of the key things that we were looking at

was the comeback scores, the actual comeback scores,

and the predicted comeback scores versus the salary of the players.

So we could see just how well they're performing

relative to their salary.

Then finally we had a predictive analysis.

We did a linear regression that was completed in JMP Pro.

I will go into a little bit more detail about that a little bit later on.

Now we're going to jump

into those descriptive and predictive analysis that we conducted.

We're going to begin with the descriptive analysis first.

Here, we want to look at the salary versus comebacks by each NBA team.

If you look at the data points,

you can see that most teams follow the trend line,

meaning that as they spend more money on their teams and salary,

they also produce a greater number of comebacks.

So you can see that the Boston Celtics

had the most comebacks at 14 towards the top,

and then the Cleveland Cavaliers have the highest salary paid,

but also one of the fewest comebacks with only five comebacks.

What we thought was the most interesting was the Indiana Pacers,

because not only did they pay such a low salary,

but they were also able to produce 12 come backs

which is the third most throughout the NBA.

I wanted to hone in on the Indiana Pacers

and see what exactly they were doing

that allowed them to produce such a high number of comebacks

with such a low salary rate.

As Weston said, we wanted to focus on the Indiana Pacers.

Here we see the salary of Pacers' players versus their individual comeback scores.

Several highly scored players are found within the Indiana Pacers roster,

as you can see with Myles Turner, Carlson, Young, George, and Oladipo.

The top scored players are spread across the salary spectrum.

So you see some cheap players such as Myles Turner or Carlson

being more of a mid- range player, salary paid player.

Then you also have more expensive players such as Paul George

or Victor Oladipo further towards the top right of the graph there.

So you can really see how they've spread the wealth out across

and are getting maximum performance out of their highly paid players,

but also finding performance out of lower paid players.

You can also see that they have several middle tier players that come into play

and provide big help to the Pacers

as they need some players to come off the bench

and be able to provide

some key value plays and produce comebacks.

As I said, one of the key points that I want to point out

was that picture Victor Oladipo— the highest paid player on the team—

is also the highest performing in terms of comeback score,

so they're definitely getting their worth out of him as a player.

All right, so now we're going to get into our predictive analysis.

To begin with the play-by-play data, we decided to make a decision tree

to predict the play type that leads Indiana Pacers producing a come back

using the following variables that you can see below.

We'll see that only a couple of these variables

actually played a huge impact in predicting whether the Pacers

will come back from 10 or more points.

In the decision tree,

there are two nodes in particular.

One where the distance shot was greater than or equal to 26 feet

from the basket, and they were making those,

as well as having a shot distance of greater than or equal to 3 feet

from the basket,

meaning that they're looking at more of a layout option.

Now we're going to look at those two nodes a little bit more in particular.

These branches, as I said,

predict that the Pacers produce comeback victories.

In the overall model we had a validation misclassification rate of 45.97%.

As I said,

the model predicts that made shots of 26 feet and further

made lay ups 3 feet or further from the basket leads to comebacks.

We would say is,

they should really focus on the three-point aspect

and more higher percentage shootings such as lay ups,

because as you can see in both of those, the prediction was one,

which in this case means that the Pacers were able to produce a comeback.

You can see that with the 26 feet and further node,

the probability that it equalled 1 was 62.75%,

Then when we were shooting lay ups 3 feet or further from the basket.

you had a probability that you would win of 75.9%,

or come back at 75.9%.

Then as I said,

there were two variables that seem the most important

of the 10 that we looked at

in predicting why the Pacers were able to produce a comeback

that was first shot distance.

Which looked at how far players shot the ball,

and then also shot outcome.

That's whether they made or missed the shot.

With the distance, as I said,

26 feet or further, which is about the three- point range,

or some of those higher percentage shots in the play for a lay up.

Then also if you're making more shots,

you're producing a higher score giving you a better chance of coming back.

All right, so now moving into

our linear regression portion of our predictive analysis.

This regression, as I said, was completed in JMP Pro,

and this was done to predict the comeback scores of individual players

based on the following variables that you see there on screen.

A couple of the key ones to point out would be their individual player salaries,

their team name, and then their career statistics,

as you see with all those different variables there.

It's also important to note

that the variables were selected for this regression

based on their level of significance.

If the variable was not found to be significant,

it was not included in the regression.

Going into the summary of fit for this linear regression,

I do want to point out that it does have a low RSquare,

but this is not a primary concern for our analysis.

We knew that the comeback score

would be based on the comeback involvement statistic,

but we now wanted to know

what the score would be based on completely different variables.

So instead of using the variables

that we used to create the statistic initially,

we're now using new variables to try to predict what it should be based on,

like I said, their salary and career stats.

That means that the predictions would vary from the original scores

and that was not only expected in our analysis,

but it was also desired that we came up with different scores

to really see how they were supposed to perform.

Now, based on this analysis,

we were able to come up with some of the most important variables.

The first of which that we saw was most important was salary.

Something that we were seeing

is that higher paid players were predicted to perform more,

which is something that you would definitely see more

in the actual NBA.

Seeing that players like Victor Oladipo or LeBron James

with higher salaries paid to them would be performing better

than those with lower salaries.

Then moving on from there, we also have the team.

This one definitely makes sense

as you see that some of the top teams

that it was looking at for a comeback victory

and predicting the comeback scores

is the Golden State Warriors and the Indiana Pacers,

which is a couple of the teams that we saw

had the highest number of comeback victories

over the seasons that we were looking at.

Then we also had a couple of career stats that really popped up

and showed to me a couple of the most important variables

for this regression.

The first of which being the career total rebounds by the players,

and then that was followed by the career points.

Seeing that player had higher career-total rebounds and higher points,

we expected those players to produce more

value whenever it came to creating comeback victory.

I'll also note that these important variables

were calculated through the log worth.

Now we're going to look at the conclusions of the presentation.

Okay, so going into some of the Indiana Pacers predictions,

specifically want to point out some Pacers' top performers

and under performers.

The blue dots that you see there are the actual Pacers' top performers

that we saw in the earlier graphs

of the actual predicted comeback score versus the salary,

whereas now we are looking at the predicted score

or their actual comeback score, sorry, versus the salary,

and now we are looking at the predicted score.

The the orange Xs mark the Pacers' underperformers.

The underperformers in this graph with the orange Xs,

we are seeing them predicted to be relatively much higher

than their teammates, whereas with their actual scores,

they are finding themselves more middle-to-lower-end of the pack

relative to their teammates,

which really shows us that they're not performing

up to what their salary and career statistics

say that they should be performing

particularly when it comes to creating a comeback victory.

But it is important to point out

that the team has done a great job of signing inexpensive players

that produce comeback wins.

We see those players such as Myles Turner or Carlson, or Young

that have a little bit lower salaries,

but they also produce a lot of plays

that can help with creating a comeback win.

Then we also wanted to point out

some of the Cleveland Cavaliers predictions

and their faults that go with them.

The Cavs should have multiple high- tier comeback players.

One specifically to point out would be Kevin Love.

Kevin Love is there at the top of the graph

and he has both an orange X and a blue mark next to his name.

That just marks that he was one of the actual top performers

for the Cavaliers,

but at the same time, he's under performing greatly.

So in our predictions,

we can see that he's predicted to actually perform better than LeBron James,

which is something that is very interesting to point out.

Like I said, with our predictions based on salary and their career stats,

we would expect Kevin Love to outperform LeBron James

when it came to producing comeback wins.

But in reality, he's actually quite far down the list

and he still remains one of the top performers,

but he does not produce nearly as much as LeBron James does.

Now we also wanted to look at the best valued players in NBA.

We show the top five here.

Looking at their predicted comeback scores,

you have people such as Karl-Anthony Towns, Joel Embiid,

and Ben Simmons

who were predicted to be some of the higher performing players

in the entire league.

But as you can also see, when this data was taken,

they had relatively low salaries compared to other players.

What we recommend here

is definitely giving these players the contracts that they are deserving of,

as they help teams produce comebacks and obviously

provide statistics that allow teams to perform their best.

They're definitely doing more for what they're actually worth.

Then we also wanted to look at the best line of predictions

for the Pacers.

As I mentioned earlier,

that three point in high percentage emphasis.

So build up the lineup of shooting threats from distance.

You have people such as Robinson, Joseph, and George

their average shot distance is about 15 feet and beyond

when the three- point line is about 25 feet,

so that shows that they are shooting a lot of threes,

but they're also making it.

Not only are they shooting from that far,

but they're also more likely to make their shots,

so those people would be good to have in the line up

whenever you are trying to produce a comeback

as they're more efficient.

Also because they can shoot from deep,

you'd expect that they also have a solid play down low

to be able to get a lay up real quick

and get those higher percentage shots go in as well.

As I mentioned,

an average distance of made shots near the three-point line is

very important for the Pacers in particular

to be able to produce a high number of comebacks.

This analysis confirms what is already going on in the NBA.

Typically, teams who find themselves down by a certain number at halftime

will throw up a little bit more three- point shots,

but also they don't really focus on that high percentage look

just from down low into the basket.

We also think that they should focus on drawing up plays,

allow them to just get a quick lay up and build momentum upon that

as they try to produce a comeback later on.

All right, so that wraps up our presentation.

We just want to say a quick thank you,

and this is where we would open it up to questions.