Choose Language Hide Translation Bar

Improved Heart Failure Prediction Using Model Screening Platform (2022-US-EPO-1137)

Cardiovascular disease is the number one cause of death globally, claiming an estimated 17.9 million lives in 2019, accounting for 32% of all deaths worldwide that year.

 

Heart failure is a common illness of cardiovascular disease, and this dataset contains 11 features that can be used to predict likely heart disease. The prediction results can help people with cardiovascular disease or high cardiovascular risk (due to the presence of one or more risk factors, such as hypertension, diabetes, hyperlipidemia, or established diseases) to predict early symptoms and detect disease risk in a timely manner.

 

The data set included 918 participants from different countries and 11 factors associated with heart failure, such as age, sex, blood pressure, blood glucose, etc. This study plans to use different analysis models in JMP software for statistical analysis of data sets, such as neural networks, logistic classifiers, Random Forest, etc. The optimal prediction model is selected by comparing model performance.

 

The model output will help people understand the importance of different factors leading to heart disease and the probability of developing heart disease under certain conditions, to help people pay more attention to the management of physical health in daily life and the prediction of disease risk.

 

 

Hello. Good  morning,  everyone.

This  is  Saijac Lami  and  I have  my  teammate,  Zhe Diao.

Basically,  we  are  a  business analytics  graduate  students

from   University of Connecticut-Stamford Campus.

Little  about  our exposure  to  JMP.

We  have  extensively  used  JMP

in  our P rediction Model  course in  our  first semester .

We  felt  it's  a  very  easy and  very  powerful  tool,

and  there  is  a  lot we  can  do  from  it.

We  are  still  exploring the  many  features  of  the  JMP.

Today,  we  are  here  to  present our work of which we did during summer

that is a heart  failure  prediction using  modern  screening  platform.

We  are  calling  it  improved because  we  use  several  JMP platforms

to  leverage  predictions.

Coming  to  the  agenda  today, so  this  is  just  an  overview  slide

from  which  you  would  get the  gist  of  what  we  are  doing,

we  talked  about followed  by  the  three  slides,

which  we  where we  talk  about  pre-processing

and  some  EDA,  which  we have  done,  and  the  modeling.

Coming  to  the  introduction,

we  know  that  cardiovascular disease

is  the  number  one causing  of  the  death  globally,

claiming  an  estimated of  70.9  billion lives  in  2019.

It  accounts  for  around  32 % of  deaths  worldwide  every  year.

In  our  problem, so  we  have  taken  a gathered  data  set,

and  we  developed  a  classification  model for  classifying  the  heart  disease.

We  also  leverage  this  predictions using  the  model  screening,

modal  comparison  and  attachment feature  in  the  JMP  16.

The  model  output  will  help in  understanding  the  importance  of  factors

that  are  leading  to  heart disease  and  the  probability.

We  also  find  the  probability of  developing  the  heart  disease,

unless when under  c ertain  conditions.

Summarizing  our  objective  is  to  build the  best  model  and  find  the  factors

that  are  leading  to  the  heart  failure using  the  JMP 16  platform.

Coming  to  a  methodology and  little  about  our  data  set,

the  data  set  included  around  918   participants  from  different  countries.

There  are  11  factors associated  to  heart  failure,

such  as  age,  sex, blood  pressure,  and  blood  glucose.

How we went  to  our  predictions

is  we  first perform the  pre-processing  of  the  data

by  exploring  if  there  are any  missing  values  or  any  outliers.

In further,  we've  performed  the  EDA analysis  to  understand  the  importance

and  the  relationship  of  the  each  feature in  relation  to  the  heart  failure.

To  build  the  model,

we  incorporated  the  following JMP  16 capabilities  in  our  methodology.

The  first thing  is  the  Model  Screening.

Which  is  an  efficient  platform, a  simultaneously  fitting,

comparing and  exploring,  selecting

and  then  deploying the  best  predictive  model.

The  next  comes  the  Model  Comparison, which  an  easy  platform

to  compare  and  select the  best- performing  predictive  model.

Next  comes the dashboard is  an  efficient  way

to  better  represent our  EDA  concisely.

We  can  run  any  time  as the  new  data  is  available.

I'm  coming  to  our  results.

This  is  just  overview of  the  results  which  we  had.

Using  the  model  screening,

we  identified  that  the  Boosted Tree  is  our  best  model.

We  choose  and  we  have also  not  just  on  accuracy.

We  focus  on  which is  having  the  least  one  rate,

because  we  do  not  want our  model  to  have  high  false  positives,

because  we  don't  want  the  heart  rate,

like  heart  failure  predict  patients not  to  work  as  not  detected.

Based on  this,  we  have  chosen the  Boosted  Tree  as  our  best  model.

Coming  to  the  column  contributions,

so  when  we  are, when we  tried  to  identify

what  are  the  important  factors  that are  causing  the  heart  failure  prediction.

When using  the  Boosted  Tree,

we  identified  that  Exercise Angina,

so  which  is  if  a  person  has

the  in- use  pain  due  to  exercise

and  also  Fasting Bs, Blood Glucose level,

Resting ECG, ST_slope,  Chestpain type

are few parameters out of 11,

are contributing to around 75% of  the heart failure.

Next comes in more depth analysis,  Zhe Diao will  be  taking  care.

Okay, after  screening  through  the  basic information  of  the  data,

such  as  target  feature, predictive  variables,  data  types,

our  analysis  work  will start  with  data  cleaning.

We  need  to  deal  with  missing  values and  all  layers  first to  get  a  clean  data.

JMP  provides  a  variety  of  ways to  explore  and  deal  with  them.

For  the  missing  value, JMP can  display  the  details

in  the  summary  table  are rarely display  them  in  cell  plot  or  tree  map.

Today,  we  show  the  statistical  table  here,

which  is  also  a  way  to  get the information  we  want.

We  can  see  that  there  is  no missing  value  in  our  data.

But  when  you  further  explore the  data  distribution,  you  will  find  that

some indicators  use  the  number  zero to  replace  the  missing  value.

We  treat  the  data  zero  with  deletion and  the  media  replacements  because

the  value  of  the  lead indicator  cannot  be  zero.

For  the  Outlier,

box plot  and  explore Outlier module  are  common  methods.

Today, we  use  the  Outlier Ana lysis function,

and  there's  a   multi-variate  module,

which  reflect  the  distance of  a  multi-dimensional  space

into  this  Mahalanobis  Distance  Graph.

We  have  retain  this   Outliers  in this analysis  because  we  consider  that

this is a  common  phenomenon in  medical  test  results.

After  completing  these  steps,

we  get  the  clean  data,  and  then we  enter  the  data  exploration  stage.

In  this  step, we  built  some  commonly  used

the  charts  to  show  some information  contained  amount  data.

JMP  provides  us many  choices  in  this part

such  as  tree  map, ring  chart,  bar  chart and  zero.

From  these  graphs, we  know  that  the  proportion  of  male

suffering  from  heart  failure is  twice  that  are  female.

80 %  of  patients  with heart  failure  have  diabetes

and  77  persons  have no  symptoms  of  chest  pain,

which  reveals the  imperceptibility  of  the  disease.

After  we  draw  this  useful  conclusions,

we  come  to  the  modeling  stage  to  further explore  the  relationship  between  data.

When  you  are  doing  data  analysis,

you  may  usually  think what  model  I  want  to  build,

or  what  model  performs  best.

Model  screening  function  in  JMP  helps  us

solve  this  problem in a  very  intuitive  way.

You  just  need  to  drag  the  target  variable

and  the  prediction  variables to  the  corresponding  positions.

JMP will  help  you  run  also appropriate  models.

In  this  analysis,  JMP write  nine  models  automatically,

including  Regression  model, Boosted Tree, Neural  ne twork  and  so  on.

You  can  get  a  detailed  and  clear  output.

If  you  only  care  about  the  results,

the  summary  table  can  help you  choose  the  best  model,

whether  you  consider residual  or  fitting  degree.

If  you  want  to  know  the  detail,

the  parameters  and  the  results  of  each model,

you  just  need  to  click  the  model you  want  to  view  in  details  part,

and  you  can  understand  the  performance of  the  model  from  all  aspects.

Here  we  intercept  a  parameters, computer  matrix  and  profiler.

In  profiler,  you  can  enter  new  data

to  observe  the  change  train  of  each variable  and  get  the  predict  the  result.

We  see  that  the  influence of  age  is  not  significant,

which  may  be  countering to  our combination.

Where  gender,  diabetes  and ST_ slope are  the  main  influence  factors.

Moreover,  in  these  results,

we  pay  attention  to the  misclassification  rate,

especially  is  a  false  negative  value,

because  it  means  that the  patient  has  heart  failure,

and  we  predict  that  he  does  not,

which  may  lead to  very  serious  consequence.

The  best  performance  model  we  select in  this  analysis  is  supposed  to  decrease

which  has  the  lowest  the  misclassification rate  and  the  highest  sensitivities.

Then  we  can  save  all the  prediction formulas

and  the  results  for  use in  the  model  comparison.

Model comparison provides more  concise  and  intuitive  format

to  show  model  performance  indicators,

which  is  convenient  for  us to  make  the  final  choice.

Now,  I  take  through  to  the  last  part  of our  presentation,  which  is  the  dashboard.

Using  the  dashboard  feature,

we  created  a  utility  where  we  added several  important  features,

which  we  discussed  before,  and  which are  critically  affecting  the  heart  rate.

Here  we  can  interact by  providing  the  inputs

Like  I  can  choose  male  or  female,

and  we  can  even choose  the  Chest pain  type.

Also,  what's  the  ST_s lope  pattern, and  also,  what's  the  excess  in  genome?

Based on  this, I  can  interact  with  the  utility,

and  also  based  on  that,  it  will  display the  probability  of  the  heart  rate,

which  is  a  pretty  useful  feature.

That  comes  to  the  last  part.

In  conclusion, I  just  want  to  summarize.

We  use  the  modern  screen  platform, and  through  which  we  explore

the best predictive  modeling for  the  heart  failure  prediction.

We  also  leverage  whatever  the  work we  try using  the  JMP 16  dashboard.

Which  we  develop  a  utility to  develop  an  interactive  platform

that  outputs  the  probability  of  the  heart failure  based  on  the  input  and  parameters.

That's  all  we  have  for  today.

Thank  you.