cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
What is Most Important in Determining Heart Disease and Stroke? (2022-US-30MP-1170)

Brittany Burlison, Student, Oklahoma State University
Kailey Wilson, Student, Oklahoma State

 

Heart disease and strokes are two major diseases that have been around for years without a cure. Heart disease is the leading cause of death in the United States, resulting in one death every 36 seconds. Of these deaths, one in six people die due to a stroke, which is also the leading cause for long-term disabilities.

 

For our research project, we explore whether these two major diseases have common factors that can predict each other. First, we built a logistic regression model for each disease. Next, we made a new variable, which returns 1 if the person has both diseases and 0 if not. Finally, we did a final analysis to see which variables in these two models can predict both diseases in one equation. From our research, we identified that the variables general health, diabetes and health coverage are the most useful in determining whether or not a person will suffer from heart disease or a stroke in their lifetime.

 

 

Hi,  my  name  is  Brittany  Burlison, and  my  copresenter  is...

I'm  Kailey  Wilson.

We  are  both  second  year  master's students  at  Oklahoma  State  University,

getting  a  master's  in  business analytics  and  data  science.

Today,  we  are  going  to  present our  research  in  what  is  most  important

in  determining  heart  disease  and  stroke.

We  will  be  going  over our  research  overview,

the  methods  that  we've  used in  our  data  overview,  our  data  analysis,

and  results  and  implications, and  what  we've  done  in  JMP.

Heart  disease  and  strokes   are  two  major  diseases

that  have  been  around  for  years, and  there's  still  no  cure  for  them.

Heart  disease  is  a  leading cause  of  death  in  the  United  States.

A  person  dies  every  30 seconds from  heart  disease.

Of  these  deaths, one  in  six  die  due  to  a  stroke,

and  strokes  are  the  leading cause  for  long-term  disabilities.

For  our  research,  we  are  looking  to  see if these  two  major  diseases

have  any  common  factors  that  will  be  able to  predict  each  other.

We  are  interested  in  seeing  what  factors are  most  important  in  determining

whether  a  person  will  suffer  from  stroke or  heart  disease  in  their  lifetime.

We  are  wanting  to  take  variables

that  correlate  to  the  Social  Determinants of  Health  to  see  what  variables  play

a  bigger  role  in  determining these  major  health  issues.

For  our  data,

we  will  be  using  for  analysis  is  the  data

from  the  Behavioral  Risk  Factor Surveillance  System,  in  short, BRFSS,

from  the  CDC.

This  is  a  phone  survey  that  collects  data from  citizens  regarding

a  plethora  of  information.

We  will  be  using  data  from  2016  to  2020.

This  contains  over  500  fields and  over  2  million  observations.

Some  of  the  fields  contain  information about  households,

current  health  conditions,  behaviors  and  demographics.

Additionally,  some  States  have  the  option to  be  more  specific  health  questions,

and  those  are  considered  too.

We  will  be  looking  at  the  variables that  people  are  asked.

For  the  methods  and  plans that  we  are  going  to  use.

Our  data  site  contains over  500  variables,  as  we  mentioned,

so  we  have  narrowed  that  list down  to  11  that  we  have  deemed

the  most  important  in  determining heart  disease  or  stroke.

We  have  referenced the  social  determinants  of  health

to  help  us  make  this  decision on  which  variables  we  should  keep.

And we  have  determined  a  few that  we'll  go  over  in the  next  slide.

So  we're  using  JMP,  specifically, the  fit  model  resource  in JMP

and  graph  builder.

The  factors  that  we  are  considering  is a  person's  sex,  their  age,  and  their  race.

So  for  our  variable  selection,

we  have  determined  that  income,  housing, education,  mental  health, health  coverage,

overall  general  health,  smoking  status, diabetes  state,  divorce,  and  medical  costs

were  the  most  important variables  to  look  at.

We  will  be  using  stroke  and  heart disease  as our  response  variables.

We  will  look  at  these  variables by  gender  using  the   sex  variable.

Then  we  will  concatenate all  five  years  of  our  data in JMP,

run  a  fit  model  test  to  determine which  preselected  variables

are  the  most  important in  determining  heart  disease  and  stroke.

Kaylee  will  go  over  our  data analysis  and  what  we  have  found.

Thank  you,  Britney.

The  first response  variable that  we  looked  at  is  heart  disease.

When  sex  is  1, that  means  it's  a  male.

So  as  we  can  see  in  our  output,

that  the  most  important  variables, based  on  their  log worth,

were  general  health,  diabetes, and  if  they  were  a  smoker.

Even  though  the  RS quare  is  pretty low,  which  means  that  only  8 %  of  the  data

is  explained  by  these  variables, since  the  p- value  is  very  small

that  means  that  the  variables  that  we have  selected  are  very  significant.

Same  we  can  see  over  when  it's  a  female.

Similarly,  the  most  important  variables are  general  health,  diabetes,

and  if  they  smoke  or  not.

We  can  come  to  the  similar  conclusion

that  the  RS quare  is  very  low,

which  makes  sense since  there  are  500  variables.

But  the  variables  that  were  selected are  still  very  significant.

Next,  we  wanted  to  look  at   what  heart  disease  looked  like

based  on  general  health.

So  general  health  was  a  variable that  was  split  into  nine  buckets.

One  being  excellent  health and  nine  being  very  poor.

So  we  can  see  that.

When  heart  disease  is  one, that  means  that  they  had  heart  disease,

and  when  it's  two,  that  means they  did  not  have  heart  disease.

As  we  can  see, when  general  health  is  two  or  three,

which  means  very  good general  health  or  good  general  health,

those  two  had  the  highest number  of  heart  disease.

Next,  we  wanted  to  look  at  stroke.

For  stroke,  for  a  female,

the  most  important  variables, out  of  the  variables  we  selected

were  diabetes,  general  health, and  then  education.

Similarly,  we  have  a  very  low  RS quare,

but  our  significance or  our  p- value  is  very  small,

which  means  that  all  of  these variables  are  still  very  significant.

Similarly,  for  males,

the  most  important  variables are  diabetes,  general  health.

T hen  the  RS quare  for  this  one is  the  smallest  RS quare  we  have  seen,

but   we  still  have  a  p- value  of  less  than...

A  very  small  p- value,  which  means it  is  still  very  significant.

S imilarly,  as  we  did  for  heart  disease, we  built  a  graph  too

based  on  general  health,

to  look  at  where  stroke  fell in  the  general  health  response.

And  the  general  health  it  falls into is,  again,  two  and  three,

which  means  people  with  very  good  health,

or  good  health, are  most  likely  to  have  stroke.

Then  we  went  and  we  created our  own  variable

for  when  someone  had a  heart  disease

and  stroke, they  would  return  a  value of one,

and  when  they  didn't have  it, it  would  be  zero.

So  here  we  can  see  for  heart disease  and  stroke,   the  most  important  variables

are  general  health,  diabetes, income,  if  they  smoke,  and  education.

This  RS quare  is  our  highest  RS quare, which  is  really  good.

This  means  that  most  of  the  data is  represented  in  this

and  our  p- value  is  still  very  small,

which  means  that  all of  these  are  significant.

Then  again,  we  made  a  graph  to  see where  the  general  health  it  fell.

We  can  see  that  for  when  someone  has heart  disease  and  stroke,

it  falls  in  three, which  is  good  general  health.

Our  conclusions  is, we  found  that  the  most  important  variables

determining  whether  or  not a  person  will  have  heart  disease

is  general health,  diabetes,  smoking,

and  if  their  parents  are  divorced, and  that  was  for  the  males.

Then  for  females,  it's  general  health, diabetes,  smoking  and  income.

Then  I'm  looking  at  stroke.

For  a  female,  it's  diabetes, general  health  and  education.

In males,  it  is  diabetes, general  health  and  health  insurance.

Then  for  both  of  them  combined,

the  most  important  ones are  general  health,  diabetes,

income,  if  they  smoke, and  their  education.

So  drawing  to  a  close, our  overall  implications.

We  would  say, to  help  prevent  heart  disease,

people  should  improve  their  overall general  health,  monitor  their  diabetes,

decrease  their  nicotine use,  etc.

Then  to  help  prevent  stroke,

people  should  improve their  general  health,

monitor  their  diabetes  as  well,

and  think  about improving  their  health  care  plan.

Then  overall,   people  should  just  focus  on

their  general  health  to  prevent  heart disease  and  stroke  and  any  other  diseases.

We  believe  that  doctors and  healthcare  providers,

if  they  take  this into  consideration,

these  are  super  important  factors in  determining  whether  a  person

will suffer  from  heart  disease  or  stroke in  their  lifetime,

and  they  will  be  able  to  provide better  health  care  options

to  their  patients.

Additionally,   we  feel  that  if  the  general  public

take  these  factors  into  consideration, it  can  help  reduce  the  risk  of  stroke

or heart  disease  overall in  the  general  public.

We  thank  you  for  listening to  our  research,

and  if  you  have  any  questions, please  let  us  know.

Thank  you.