Improved Heart Failure Prediction Using Model Screening Platform (2022-US-EPO-1137)

2 Kudos

Poster Winner

Di Zhao, Graduate, University of Connecticut
Sai Jahnavi Gamalapati, Graduate, University of Connecticut

Cardiovascular disease is the number one cause of death globally, claiming an estimated 17.9 million lives in 2019, accounting for 32% of all deaths worldwide that year.

Heart failure is a common illness of cardiovascular disease, and this dataset contains 11 features that can be used to predict likely heart disease. The prediction results can help people with cardiovascular disease or high cardiovascular risk (due to the presence of one or more risk factors, such as hypertension, diabetes, hyperlipidemia, or established diseases) to predict early symptoms and detect disease risk in a timely manner.

The data set included 918 participants from different countries and 11 factors associated with heart failure, such as age, sex, blood pressure, blood glucose, etc. This study plans to use different analysis models in JMP software for statistical analysis of data sets, such as neural networks, logistic classifiers, Random Forest, etc. The optimal prediction model is selected by comparing model performance.

The model output will help people understand the importance of different factors leading to heart disease and the probability of developing heart disease under certain conditions, to help people pay more attention to the management of physical health in daily life and the prediction of disease risk.

Hello. Good morning, everyone.

This is Saijac Lami and I have my teammate, Zhe Diao.

Basically, we are a business analytics graduate students

from University of Connecticut-Stamford Campus.

Little about our exposure to JMP.

We have extensively used JMP

in our P rediction Model course in our first semester .

We felt it's a very easy and very powerful tool,

and there is a lot we can do from it.

We are still exploring the many features of the JMP.

Today, we are here to present our work of which we did during summer

that is a heart failure prediction using modern screening platform.

We are calling it improved because we use several JMP platforms

to leverage predictions.

Coming to the agenda today, so this is just an overview slide

from which you would get the gist of what we are doing,

we talked about followed by the three slides,

which we where we talk about pre-processing

and some EDA, which we have done, and the modeling.

Coming to the introduction,

we know that cardiovascular disease

is the number one causing of the death globally,

claiming an estimated of 70.9 billion lives in 2019.

It accounts for around 32 % of deaths worldwide every year.

In our problem, so we have taken a gathered data set,

and we developed a classification model for classifying the heart disease.

We also leverage this predictions using the model screening,

modal comparison and attachment feature in the JMP 16.

The model output will help in understanding the importance of factors

that are leading to heart disease and the probability.

We also find the probability of developing the heart disease,

unless when under c ertain conditions.

Summarizing our objective is to build the best model and find the factors

that are leading to the heart failure using the JMP 16 platform.

Coming to a methodology and little about our data set,

the data set included around 918 participants from different countries.

There are 11 factors associated to heart failure,

such as age, sex, blood pressure, and blood glucose.

How we went to our predictions

is we first perform the pre-processing of the data

by exploring if there are any missing values or any outliers.

In further, we've performed the EDA analysis to understand the importance

and the relationship of the each feature in relation to the heart failure.

To build the model,

we incorporated the following JMP 16 capabilities in our methodology.

The first thing is the Model Screening.

Which is an efficient platform, a simultaneously fitting,

comparing and exploring, selecting

and then deploying the best predictive model.

The next comes the Model Comparison, which an easy platform

to compare and select the best- performing predictive model.

Next comes the dashboard is an efficient way

to better represent our EDA concisely.

We can run any time as the new data is available.

I'm coming to our results.

This is just overview of the results which we had.

Using the model screening,

we identified that the Boosted Tree is our best model.

We choose and we have also not just on accuracy.

We focus on which is having the least one rate,

because we do not want our model to have high false positives,

because we don't want the heart rate,

like heart failure predict patients not to work as not detected.

Based on this, we have chosen the Boosted Tree as our best model.

Coming to the column contributions,

so when we are, when we tried to identify

what are the important factors that are causing the heart failure prediction.

When using the Boosted Tree,

we identified that Exercise Angina,

so which is if a person has

the in- use pain due to exercise

and also Fasting Bs, Blood Glucose level,

Resting ECG, ST_slope, Chestpain type

are few parameters out of 11,

are contributing to around 75% of the heart failure.

Next comes in more depth analysis, Zhe Diao will be taking care.

Okay, after screening through the basic information of the data,

such as target feature, predictive variables, data types,

our analysis work will start with data cleaning.

We need to deal with missing values and all layers first to get a clean data.

JMP provides a variety of ways to explore and deal with them.

For the missing value, JMP can display the details

in the summary table are rarely display them in cell plot or tree map.

Today, we show the statistical table here,

which is also a way to get the information we want.

We can see that there is no missing value in our data.

But when you further explore the data distribution, you will find that

some indicators use the number zero to replace the missing value.

We treat the data zero with deletion and the media replacements because

the value of the lead indicator cannot be zero.

For the Outlier,

box plot and explore Outlier module are common methods.

Today, we use the Outlier Ana lysis function,

and there's a multi-variate module,

which reflect the distance of a multi-dimensional space

into this Mahalanobis Distance Graph.

We have retain this Outliers in this analysis because we consider that

this is a common phenomenon in medical test results.

After completing these steps,

we get the clean data, and then we enter the data exploration stage.

In this step, we built some commonly used

the charts to show some information contained amount data.

JMP provides us many choices in this part

such as tree map, ring chart, bar chart and zero.

From these graphs, we know that the proportion of male

suffering from heart failure is twice that are female.

80 % of patients with heart failure have diabetes

and 77 persons have no symptoms of chest pain,

which reveals the imperceptibility of the disease.

After we draw this useful conclusions,

we come to the modeling stage to further explore the relationship between data.

When you are doing data analysis,

you may usually think what model I want to build,

or what model performs best.

Model screening function in JMP helps us

solve this problem in a very intuitive way.

You just need to drag the target variable

and the prediction variables to the corresponding positions.

JMP will help you run also appropriate models.

In this analysis, JMP write nine models automatically,

including Regression model, Boosted Tree, Neural ne twork and so on.

You can get a detailed and clear output.

If you only care about the results,

the summary table can help you choose the best model,

whether you consider residual or fitting degree.

If you want to know the detail,

the parameters and the results of each model,

you just need to click the model you want to view in details part,

and you can understand the performance of the model from all aspects.

Here we intercept a parameters, computer matrix and profiler.

In profiler, you can enter new data

to observe the change train of each variable and get the predict the result.

We see that the influence of age is not significant,

which may be countering to our combination.

Where gender, diabetes and ST_ slope are the main influence factors.

Moreover, in these results,

we pay attention to the misclassification rate,

especially is a false negative value,

because it means that the patient has heart failure,

and we predict that he does not,

which may lead to very serious consequence.

The best performance model we select in this analysis is supposed to decrease

which has the lowest the misclassification rate and the highest sensitivities.

Then we can save all the prediction formulas

and the results for use in the model comparison.

Model comparison provides more concise and intuitive format

to show model performance indicators,

which is convenient for us to make the final choice.

Now, I take through to the last part of our presentation, which is the dashboard.

Using the dashboard feature,

we created a utility where we added several important features,

which we discussed before, and which are critically affecting the heart rate.

Here we can interact by providing the inputs

Like I can choose male or female,

and we can even choose the Chest pain type.

Also, what's the ST_s lope pattern, and also, what's the excess in genome?

Based on this, I can interact with the utility,

and also based on that, it will display the probability of the heart rate,

which is a pretty useful feature.

That comes to the last part.

In conclusion, I just want to summarize.

We use the modern screen platform, and through which we explore

the best predictive modeling for the heart failure prediction.

We also leverage whatever the work we try using the JMP 16 dashboard.

Which we develop a utility to develop an interactive platform

that outputs the probability of the heart failure based on the input and parameters.

That's all we have for today.

Thank you.