Choose Language Hide Translation Bar

Evaluating Which Plays and Players Provide the Best Opportunity for a Comeback in the NBA (2022-US-30MP-1167)

The National Basketball Association (NBA) has long been a league where impressive events happen on a regular basis. Because of this the standard for noteworthy achievements has been raised time and time again. You will frequently see social media posts from top sports accounts regarding players scoring 30+ points or incredible dunks. However, you rarely see a post about a comeback victory unless the team overcame the insurmountable odds of being down by, what appears to be, too many points to win. This presentation considers the individual plays, players, and salary figures that give a team the best chance at achieving a comeback victory. Descriptive analytics are used to gain valuable insights into how often a team can produce a comeback when trailing by ten or more points at half, which teams are achieving them most often, which players are most involved in producing comebacks and the salaries of those players. The focus for the predictive portion of the analysis is on predicting the involvement of players based on their career stats and salary figures and the types of plays NBA teams can prioritize to achieve comeback victories more often.

 

 

Today  we're  going  to  talk  about

what  provides  the  best  opportunity for  a  comeback  in  the  NBA.

So  my  name  is  Weston  Salmon.

I'm  currently  a  student at  Oklahoma  State  University

studying  for  Business  Analytics and  Data  Science   in  our  Masters  program.

And  my  name  is  Zach  Miller.

I'm  also  a  student at  Oklahoma  State  University

and  also  studying  for  my  Masters in  Business A nalytics  and  Data  Science.

All  right,  so  we're  going  to  cover the  table  contents  real  quick.

that  shows  what  we're  doing throughout  the  presentation.

First  we're  going  to  begin with  an  introduction

that  discusses  why  we're  here and  what  exactly  the  study  is  for.

Then  we're  going  to  jump into  our  data and methods,

and  that   looks  at what  does  the  data  look  like,

how  did  we  complete  our  analysis,

and  the  different  ways we  manipulated  our  data.

We'll  then  look  at  the  descriptive and predictive proportions

that   show  what  can  we  derive from  the  data as it sits

and  then  what do  our  predictions  look  like.

Then  at  the  end,

we'll  conclude  the  presentation

and   look  at  what  are  the  implications of the analysis

and  how  it  can  be  used by  NBA  teams  in  the  future.

Here's  a  quote  by  Gabe  Frank.

He's  the  Director  of  Basketball Analytics  with  the  San  Antonio  Spurs.

We  thought  this  would  be  a  good  quote

to throw in as it deals with  [inaudible 00:01:16]

and  also  the  NBA is  in  general.

So  he  said,

"I  think  analytics have  grown in popularity

because  it  can  give  you competitive  advantage

if  you  do  it  well.

Every  little  bit  helps."

Through  our  presentation, we're  going  to  discuss

NBA  analytics  and  how  they  can  produce come  backs

based  on  the  data  that  we  find.

We  thought  this  quote  really  spoke to  the  overall  objective  of  our  project.

Now I'll  pass  it  off  to  Zach to i ntroduce  the  project  as  a  whole.

Thanks,  Weston.

Going  into  the  introduction  here, going  into  a  NBA  season,

every  team  has  one  common  goal, and  that's  to  win  the  championship.

Like  I  said,  going  into  the  season, a  lot  of  teams  hope  for  40  plus  wins

in  their  82- game  season,

but  a  great  season typically  results  in  50  plus  wins.

Then  also  our  primary  interest for  this  presentation  and  analysis

is  in  the  hard  fought  victories or also  known as a comeback victory

by  these  different  teams.

We  define  a  comeback  victory

as  the  winning  team losing  by  10  plus  points  at  halftime.

At  halftime  being  down  by  more than  10 points by the end of the game,

we  were  seeing  that  every  so  often

there  were  teams  that  were  taking  a  lead and  ultimately  winning  this  game.

Then  finally,

we  have  our  analysis  that  utilizes play- by- play  data  and  salary  data  sets,

which  we  will  go  into  a  little more  detail  in  just  a  few  minutes.

Now  we're  going  to  discuss the  two business questions

that  we  want to  answer  using  both  the  data.

First,  looking  at  the  play- by- play data,

we  want  to  know  exactly  what  plays or  sequence  of  plays  within  the  game

give  the  team  the  greatest  chance at  a  comeback  victory.

We  want  to  know  what  exactly players and coaches can do

and   draw  up  in  the  line  up to  produce  a  comeback.

Then  with  the  salary and  career  stats data

we  really  want  to  see how  those  variables,

the  salary  and  career  stats,

can  be  used  to  determine  how  involved a  player  should  be  in  comeback  victories.

So  not  necessarily  just  how  well they  perform  as  they  actually  perform,

but  also  how  they  should  perform based  on  these  variables.

So  which  players  are  underperforming

and  over  performing  according to  their  contract  and  track  record?

Next  we'll  discuss  the  data  and  methods that  we  used  for  both  the  data  sets.

Like  I  said,  we  ha d two  data  sets.

The  first  one  focused on   play-by-play  data.

This  data  contains  the  process and  outcome of every play

within  every  game  from  2015  through  2018.

So  it  included  all  30  NBA  teams and  exactly  what  they  did

in  every  single  play  throughout  the  games throughout  these  years.

Zach  will  also  talk  about the  salary  data  that  we  have.

Right,  so  going  into  the  salary  data, it  contains  the  salary  information

for  each  of  the  players

that  were  mentioned in  the   play-by-play  data.

Whether  that  player  that  was  mentioned in the play-by-play data

played  one  minute  of  NBA  time or  hundreds  of  minutes  of  NBA time,

they  were  appearing in  my  salary  data.

I  could  then  go  and  see  what  their career  stats  and  what  their  salary  was

for  the  seasons  that  we  were  looking at within  the   play-by-play  data.

Now  we're  going  to  look at  the  key variables

that  we  found within  the   play-by-play  data  set.

You  can  see  things  such  as  comeback,

half time  deficit,  the  shot  distance, outcome  type,  and  rebound  type.

But  two  that  we  want  to  focus in in  particular  was  the  comeback.

That's  a  variable  [inaudible 00:04:51]

it  was  our  flag  variable and  used as  our  predictor.

We  flagged  the  one  next  to  all  games

where  a  team  trailed by  10  or  more  points at halftime

and  came  back  and  won and  it  was  a  zero  if  not,

because  as  we  said,

the  overall  goal  of  this  presentation

is  to  see  what  leads to  come  back  in  general.

Then  also  we  want  to  look at the halftime deficit,

which  was  another  variable that  we  created.

This  shows  the  number  of  points, the  certain team  trailed  by at  halftime.

If  the  deficit  was  greater  than or  equal to 10 points,

then  those  were  the  games that  we  specifically  looked  at,

and  then  we  want  to  see if   the plays made throughout those games

led  to  a  comeback  in  the  end.

Now  looking  at  some  of  the  key variables  for  the  salary  data.

These  as  a  whole are  variables  that  we decided

were  going  to  be  important for our  analysis.

But  once  again,

I  wanted  to   focus on  a  few  of  these or a couple of these variables

as  I  feel  that  they  are  more important to  point  out  and  explain.

First  of  which  being  the  player involvement  variable.

The  player  involvement  variables account of individual involvement

on  key  plays  during  comebacks.

These  plays  could  include  shots, rebounds, fouls,

any  of  the  actionable  plays

that  we  see  throughout the   play-by-play  data.

So  we  wanted  to  take  individual  accounts

from  players  so  we  could  see

how  many  times  a  certain  player

was  shooting the ball throughout the seasons,

and  really  be  able  to  compare these  players

to  other players within the league

Then  going  to  come  back  score.

This  is  a  min- max  scoring  method

that  we  use  to  score  the  overall player  involvements.

This  is  what  we  use  to  really  quantify how  involved  these  players  were.

This  is  utilizing the  player involvement variable

that  you  see  that  I  just  explained.

I  wanted  to  go  a  little  bit  deeper into  the  comeback  score  calculation

just  to  make  sure  that  everyone understands  how  this  was  calculated.

As  I  said, it  was  a  min- max  scoring method,

and  this  was  used  to  determine the involvement of the players

during  their  team's  comeback  victories.

This  min- max  method  creates  the  scores,

taking  the  players  involvement into account,

relative  to  the  range  of  values that  appear  for  each  variable.

It  would  take  the  maximum count  of  these  different  plays

and  it  would  use  that  as  the  maximum and  then  a  minimum

of  typically  what  we  found  would  be  zero as  certain  players  would  only  play

a  very  low  amount  of  minutes  from  zero and  all  the  way  up  to  hundreds  of  minutes.

But  typically with  the  zero  minute  players,

we  found  that  they  did  not  contribute much  to  these  comeback  wins.

Below  you  see  the  formula that  we  used for each of the players

to  create  this  comeback  score.

This  is  a  perfect  example as  we  see  the  assist  count

divided  by  the  maximum assist  times . 1667,

which  .1667  being  1  divided  by the  total  of  the  6  included variables,

which  is  what  we  would  call the  weight  for  the  formula.

Each  of  these  variables was  weighted  equally

and  we  took  the  min- max  score for each of the variables

and  multiplied that  by  100  to  get  the  final  score.

We'll  now  look at  the  play- by- play  analysis  method.

When  taking  this  data, we  first  began  by  merging.

We  had  six  CSV  files,

one  that  identified  each  individual  year,

and  we  combined  all  those into  one  central file

so  we  could  look  at  each   play-by-play data  from  the  six  years  that  we  had.

We  then  transformed  the  data using  flag  variables.

As  we  said,

we  created  a  column  that  specified whether  there  was  a  comeback  or  not.

We  first  looked  at  the  halftime  scores

and  saw  if  teams  were  trailing by  10  or  more  at  halftime.

We  would  then  take  those  games

and  then  see if  a  comeback  actually  occurred.

If  it  did,  we  flagged  one  and  specifically we'll  get  those  plays  that  occurred.

Then  for  the  descriptive  analysis,

we  looked  at  different graphs  within  Tableau.

These  included  things  as  how  far  away the  players  were  shooting  from  the  basket,

and whether  they  were  missing or  faking  their  shots,

the  rebound  types, a nd  things  like that

to   get  an  idea  of  what  players were  doing  during  the  games,

if  they  were  actually  producing good  outcome  to  secure  a  comeback.

Then  lastly,  the  predictive  analysis we  did  in  J MP Pro  using  a  decision  tree

to   see  which  plays  and  sequences of plays produce  the  come  back

and  how  we  can  better  look  at  those in  the  future  to  then  have  teams

be  able  to  produce more  comebacks  throughout  the  season.

Now  for  the  methods with  the  salary analysis.

First  off, we  had  to  do  some  table  joins.

These  joints  were  necessary to  get  all of the data tables together

as  we  needed  them  all  together

to  really be able to dive into everything  as  a  whole

Separated  it  wasn't  too  much  help  for  us.

Then  we've  moved on  to  some  data  transformation.

We  wrote  SQL  queries  to  gather the  counts  of  the  key  metrics.

This  is  how  we  got  the  counts  of  shots

for  the  various  players

along  with  other  things such  as  rebalance  or  fouls.

Then  we  moved  on  to  some  descriptive analysis  that  was  completed  in  Tableau.

With  this  descriptive  analysis,

one  of  the  key  things that  we  were  looking at

was  the  comeback  scores, the  actual  comeback  scores,

and  the  predicted  comeback  scores versus  the  salary  of  the  players.

So  we  could  see just  how  well  they're  performing

relative  to  their  salary.

Then  finally  we  had a  predictive  analysis.

We  did  a  linear  regression that  was  completed  in  JMP  Pro.

I  will  go  into  a  little  bit  more detail about  that  a  little  bit  later  on.

Now  we're  going  to  jump

into  those  descriptive and  predictive  analysis  that  we  conducted.

We're  going  to  begin with  the  descriptive  analysis  first.

Here,  we  want  to  look  at  the  salary versus  comebacks  by  each  NBA  team.

If  you  look  at  the  data  points,

you  can  see that  most  teams  follow the trend line,

meaning  that  as  they  spend more money on  their  teams  and  salary,

they  also  produce  a  greater number  of  comebacks.

So  you  can  see  that  the  Boston  Celtics

had  the  most  comebacks at  14  towards the top,

and  then  the  Cleveland  Cavaliers have  the  highest  salary  paid,

but  also  one  of  the  fewest comebacks  with  only  five  comebacks.

What  we  thought  was  the  most  interesting was  the  Indiana  Pacers,

because  not  only  did  they  pay such  a  low  salary,

but  they  were  also  able  to  produce 12 come backs

which  is  the  third  most throughout  the  NBA.

I  wanted  to  hone in  on the Indiana Pacers

and  see  what  exactly  they were doing

that  allowed  them  to  produce such a high number of comebacks

with  such  a  low  salary  rate.

As  Weston  said,  we  wanted to  focus  on  the  Indiana  Pacers.

Here  we  see  the  salary  of  Pacers' players versus their individual comeback scores.

Several  highly  scored  players  are  found within  the  Indiana  Pacers  roster,

as  you  can  see  with  Myles  Turner, Carlson,  Young,  George,  and  Oladipo.

The  top  scored  players are  spread  across  the  salary  spectrum.

So  you  see  some  cheap  players such  as Myles Turner or Carlson

being  more  of  a  mid- range  player, salary  paid  player.

Then  you  also  have  more  expensive players  such  as  Paul  George

or Victor  Oladipo  further  towards the  top  right  of  the  graph  there.

So  you  can  really  see how  they've  spread the wealth out across

and  are  getting  maximum  performance out  of  their  highly paid players,

but  also  finding  performance out  of  lower  paid  players.

You  can  also  see  that  they  have  several middle  tier  players  that  come  into  play

and  provide  big  help  to  the  Pacers

as  they  need  some  players to  come  off  the  bench

and  be  able  to  provide

some  key  value  plays and  produce  comebacks.

As  I  said,  one  of  the  key  points that I want to point out

was  that  picture  Victor Oladipo— the  highest  paid  player on the team—

is  also  the  highest  performing in  terms  of  comeback  score,

so  they're  definitely  getting their  worth  out  of  him  as  a  player.

All  right,  so  now  we're  going  to  get into  our  predictive  analysis.

To  begin  with  the   play-by-play  data, we  decided  to  make  a  decision  tree

to  predict  the  play  type  that  leads Indiana  Pacers  producing   a come back

using  the  following  variables that you  can  see  below.

We'll  see  that only  a  couple of these variables

actually  played  a  huge  impact in  predicting  whether  the  Pacers

will  come  back  from  10  or  more  points.

In  the  decision  tree,

there  are  two  nodes  in  particular.

One  where  the  distance  shot was  greater than or equal to 26 feet

from  the  basket, and  they  were  making  those,

as  well  as having a shot distance of greater than or equal to 3 feet

from  the  basket,

meaning  that  they're  looking at more  of  a  layout  option.

Now  we're  going  to  look  at  those two  nodes  a  little  bit  more  in  particular.

These  branches,  as  I  said,

predict  that  the  Pacers produce  comeback  victories.

In  the  overall  model  we  had  a  validation misclassification  rate  of  45.97%.

As  I  said,

the  model  predicts that  made  shots  of  26 feet and further

made  lay ups  3 feet  or  further from  the  basket  leads  to  comebacks.

We  would  say  is,

they  should  really  focus on  the  three-point aspect

and  more  higher  percentage shootings  such  as  lay ups,

because  as  you  can  see  in  both  of  those, the  prediction  was  one,

which  in  this  case  means  that  the  Pacers were  able  to  produce  a  comeback.

You  can  see  that with  the  26  feet and further node,

the  probability  that  it  equalled  1 was  62.75%,

Then  when  we were shooting lay ups 3 feet or further from the basket.

you  had  a  probability that  you  would  win of 75.9%,

or  come  back  at  75.9%.

Then  as  I  said,

there  were  two  variables that  seem the most important

of  the  10  that  we  looked  at

in predicting  why  the  Pacers were able to produce a comeback

that  was  first  shot  distance.

Which  looked  at  how  far players  shot  the  ball,

and  then  also  shot  outcome.

That's  whether  they  made or  missed  the  shot.

With  the  distance,  as  I  said,

26 feet  or  further, which  is  about  the  three- point  range,

or  some  of  those  higher  percentage  shots in  the  play  for   a  lay up.

Then  also  if  you're  making  more  shots,

you're  producing  a  higher  score giving  you  a  better  chance  of  coming  back.

All  right,  so  now  moving  into

our  linear  regression  portion of  our  predictive analysis.

This  regression,  as  I  said, was  completed  in  JMP  Pro,

and  this  was  done  to  predict the  comeback  scores  of  individual  players

based  on  the  following  variables that  you  see  there  on  screen.

A  couple  of  the  key  ones  to  point  out would be their individual player salaries,

their  team  name, and  then  their  career statistics,

as  you  see  with  all  those different  variables  there.

It's  also  important  to  note

that  the  variables were  selected  for  this regression

based  on  their  level  of  significance.

If  the  variable was  not  found  to  be significant,

it  was  not included  in  the  regression.

Going  into  the  summary  of  fit for this linear regression,

I  do  want  to  point  out that it does have a low RSquare,

but  this  is  not  a  primary  concern for  our  analysis.

We  knew  that  the  comeback  score

would  be  based  on  the  comeback involvement statistic,

but  we  now  wanted  to  know

what  the  score would be based  on  completely  different  variables.

So  instead  of  using  the  variables

that  we  used  to  create the  statistic  initially,

we're  now  using  new  variables  to  try to  predict  what  it  should  be  based  on,

like  I  said,  their  salary and  career  stats.

That  means  that  the  predictions would  vary  from  the  original  scores

and  that  was  not  only  expected in  our  analysis,

but  it  was  also  desired that  we came up with different scores

to  really  see how  they  were  supposed  to  perform.

Now,  based  on  this  analysis,

we  were  able  to  come  up with  some  of  the  most  important  variables.

The  first  of  which  that  we  saw was  most  important  was  salary.

Something  that  we  were  seeing

is  that  higher  paid  players were  predicted to perform more,

which  is  something that  you  would definitely see more

in  the  actual  NBA.

Seeing  that  players like  Victor Oladipo  or LeBron James

with  higher  salaries  paid  to  them would  be  performing  better

than  those  with  lower  salaries.

Then  moving  on  from  there, we  also  have  the  team.

This  one  definitely  makes  sense

as  you  see  that  some  of  the  top  teams

that  it  was looking at for a comeback victory

and  predicting  the  comeback  scores

is  the  Golden  State  Warriors and the Indiana  Pacers,

which  is  a  couple  of  the  teams  that  we  saw

had  the  highest  number of  comeback victories

over  the  seasons that  we  were  looking  at.

Then  we  also  had  a  couple  of  career stats that  really  popped  up

and  showed  to  me  a  couple of  the  most  important  variables

for  this  regression.

The  first  of  which  being the  career  total  rebounds  by  the  players,

and  then  that  was  followed by  the  career  points.

Seeing  that  player  had  higher career-total rebounds and higher points,

we  expected  those  players  to  produce  more

value  whenever  it  came to  creating  comeback  victory.

I'll  also  note that  these  important variables

were  calculated  through  the  log worth.

Now  we're  going  to  look at  the  conclusions  of  the  presentation.

Okay,  so  going  into  some of  the  Indiana  Pacers  predictions,

specifically  want  to  point out some Pacers' top performers

and  under  performers.

The  blue  dots  that  you  see  there are the actual Pacers' top performers

that  we  saw  in  the  earlier  graphs

of  the  actual predicted comeback score versus the salary,

whereas  now  we  are  looking at the predicted score

or  their  actual  comeback  score, sorry,  versus  the  salary,

and now  we  are  looking  at the  predicted score.

The  the  orange  Xs mark  the  Pacers'  underperformers.

The  underperformers in  this  graph  with  the  orange  Xs,

we  are  seeing  them  predicted to  be relatively much higher

than  their  teammates, whereas  with  their  actual  scores,

they  are  finding  themselves more  middle-to-lower-end of the pack

relative  to  their  teammates,

which  really  shows  us that  they're  not  performing

up  to  what  their  salary and career statistics

say that  they  should  be  performing

particularly  when  it  comes to  creating  a  comeback  victory.

But  it  is  important  to  point  out

that  the  team  has  done  a  great  job of signing inexpensive players

that  produce  comeback  wins.

We  see  those  players such  as Myles Turner or Carlson, or Young

that  have  a  little  bit  lower  salaries,

but  they  also  produce  a  lot  of  plays

that  can  help with  creating  a  comeback  win.

Then  we  also  wanted  to  point  out

some of the Cleveland Cavaliers predictions

and  their  faults  that  go  with  them.

The  Cavs  should  have  multiple high- tier  comeback  players.

One  specifically  to  point out  would  be  Kevin  Love.

Kevin  Love  is  there at  the  top of the graph

and  he  has  both  an  orange  X and  a  blue  mark  next  to  his  name.

That  just  marks  that  he  was  one of  the  actual  top  performers

for the Cavaliers,

but  at  the  same  time, he's  under  performing  greatly.

So  in  our  predictions,

we  can  see  that  he's  predicted  to  actually perform  better  than LeBron James,

which  is  something that  is  very  interesting  to  point  out.

Like  I  said,  with  our  predictions based on salary and their career stats,

we  would  expect  Kevin  Love to  outperform LeBron James

when  it  came  to  producing  comeback  wins.

But  in  reality, he's  actually  quite  far down the list

and  he  still  remains one  of  the  top  performers,

but  he  does  not  produce nearly  as  much  as  LeBron  James  does.

Now  we  also  wanted  to  look at  the  best  valued  players  in  NBA.

We  show  the  top  five  here.

Looking  at  their  predicted comeback  scores,

you  have  people  such  as Karl-Anthony Towns, Joel Embiid,

and  Ben Simmons

who were predicted  to  be  some of  the  higher  performing  players

in  the  entire  league.

But  as  you  can  also  see, when  this  data  was  taken,

they  had  relatively  low  salaries compared  to  other  players.

What  we   recommend  here

is  definitely  giving  these  players the  contracts that they are deserving of,

as  they  help  teams  produce  comebacks and  obviously

provide  statistics that  allow  teams  to  perform  their  best.

They're  definitely doing more for  what  they're  actually  worth.

Then  we  also  wanted  to  look at the best line of predictions

for  the  Pacers.

As  I  mentioned  earlier,

that  three  point in  high  percentage  emphasis.

So  build  up  the lineup of  shooting  threats  from  distance.

You  have  people  such  as  Robinson, Joseph, and George

their  average  shot  distance is  about  15 feet and beyond

when  the  three- point  line is  about  25 feet,

so  that   shows that  they  are shooting a lot of threes,

but  they're  also  making  it.

Not  only  are  they  shooting  from  that  far,

but  they're  also  more  likely to  make  their  shots,

so  those   people would  be  good to have  in  the  line  up

whenever  you  are  trying to  produce  a  comeback

as  they're  more  efficient.

Also  because  they  can  shoot  from  deep,

you'd  expect  that  they  also  have  a  solid play  down  low

to  be  able  to  get  a   lay up real quick

and  get  those  higher percentage  shots  go in  as  well.

As  I  mentioned,

an  average  distance  of  made  shots near the three-point line is

very  important for the Pacers in particular

to  be  able  to  produce a  high  number  of  comebacks.

This  analysis  confirms what  is  already  going  on  in  the  NBA.

Typically,  teams  who  find  themselves down  by  a  certain  number  at  halftime

will  throw  up  a  little  bit  more three- point  shots,

but  also  they  don't  really  focus on that high percentage look

just  from  down  low  into  the  basket.

We  also  think  that  they  should  focus on drawing up plays,

allow  them  to  just  get  a  quick   lay up and build momentum upon that

as  they  try  to  produce a  comeback  later  on.

All  right,  so  that  wraps up our  presentation.

We  just  want  to  say  a  quick  thank  you,

and  this  is  where we  would  open  it  up  to  questions.