Hello. Good morning, everyone.
This is Saijac Lami and I have my teammate, Zhe Diao.
Basically, we are a business analytics graduate students
from University of Connecticut-Stamford Campus.
Little about our exposure to JMP.
We have extensively used JMP
in our P rediction Model course in our first semester .
We felt it's a very easy and very powerful tool,
and there is a lot we can do from it.
We are still exploring the many features of the JMP.
Today, we are here to present our work of which we did during summer
that is a heart failure prediction using modern screening platform.
We are calling it improved because we use several JMP platforms
to leverage predictions.
Coming to the agenda today, so this is just an overview slide
from which you would get the gist of what we are doing,
we talked about followed by the three slides,
which we where we talk about pre-processing
and some EDA, which we have done, and the modeling.
Coming to the introduction,
we know that cardiovascular disease
is the number one causing of the death globally,
claiming an estimated of 70.9 billion lives in 2019.
It accounts for around 32 % of deaths worldwide every year.
In our problem, so we have taken a gathered data set,
and we developed a classification model for classifying the heart disease.
We also leverage this predictions using the model screening,
modal comparison and attachment feature in the JMP 16.
The model output will help in understanding the importance of factors
that are leading to heart disease and the probability.
We also find the probability of developing the heart disease,
unless when under c ertain conditions.
Summarizing our objective is to build the best model and find the factors
that are leading to the heart failure using the JMP 16 platform.
Coming to a methodology and little about our data set,
the data set included around 918 participants from different countries.
There are 11 factors associated to heart failure,
such as age, sex, blood pressure, and blood glucose.
How we went to our predictions
is we first perform the pre-processing of the data
by exploring if there are any missing values or any outliers.
In further, we've performed the EDA analysis to understand the importance
and the relationship of the each feature in relation to the heart failure.
To build the model,
we incorporated the following JMP 16 capabilities in our methodology.
The first thing is the Model Screening.
Which is an efficient platform, a simultaneously fitting,
comparing and exploring, selecting
and then deploying the best predictive model.
The next comes the Model Comparison, which an easy platform
to compare and select the best- performing predictive model.
Next comes the dashboard is an efficient way
to better represent our EDA concisely.
We can run any time as the new data is available.
I'm coming to our results.
This is just overview of the results which we had.
Using the model screening,
we identified that the Boosted Tree is our best model.
We choose and we have also not just on accuracy.
We focus on which is having the least one rate,
because we do not want our model to have high false positives,
because we don't want the heart rate,
like heart failure predict patients not to work as not detected.
Based on this, we have chosen the Boosted Tree as our best model.
Coming to the column contributions,
so when we are, when we tried to identify
what are the important factors that are causing the heart failure prediction.
When using the Boosted Tree,
we identified that Exercise Angina,
so which is if a person has
the in- use pain due to exercise
and also Fasting Bs, Blood Glucose level,
Resting ECG, ST_slope, Chestpain type
are few parameters out of 11,
are contributing to around 75% of the heart failure.
Next comes in more depth analysis, Zhe Diao will be taking care.
Okay, after screening through the basic information of the data,
such as target feature, predictive variables, data types,
our analysis work will start with data cleaning.
We need to deal with missing values and all layers first to get a clean data.
JMP provides a variety of ways to explore and deal with them.
For the missing value, JMP can display the details
in the summary table are rarely display them in cell plot or tree map.
Today, we show the statistical table here,
which is also a way to get the information we want.
We can see that there is no missing value in our data.
But when you further explore the data distribution, you will find that
some indicators use the number zero to replace the missing value.
We treat the data zero with deletion and the media replacements because
the value of the lead indicator cannot be zero.
For the Outlier,
box plot and explore Outlier module are common methods.
Today, we use the Outlier Ana lysis function,
and there's a multi-variate module,
which reflect the distance of a multi-dimensional space
into this Mahalanobis Distance Graph.
We have retain this Outliers in this analysis because we consider that
this is a common phenomenon in medical test results.
After completing these steps,
we get the clean data, and then we enter the data exploration stage.
In this step, we built some commonly used
the charts to show some information contained amount data.
JMP provides us many choices in this part
such as tree map, ring chart, bar chart and zero.
From these graphs, we know that the proportion of male
suffering from heart failure is twice that are female.
80 % of patients with heart failure have diabetes
and 77 persons have no symptoms of chest pain,
which reveals the imperceptibility of the disease.
After we draw this useful conclusions,
we come to the modeling stage to further explore the relationship between data.
When you are doing data analysis,
you may usually think what model I want to build,
or what model performs best.
Model screening function in JMP helps us
solve this problem in a very intuitive way.
You just need to drag the target variable
and the prediction variables to the corresponding positions.
JMP will help you run also appropriate models.
In this analysis, JMP write nine models automatically,
including Regression model, Boosted Tree, Neural ne twork and so on.
You can get a detailed and clear output.
If you only care about the results,
the summary table can help you choose the best model,
whether you consider residual or fitting degree.
If you want to know the detail,
the parameters and the results of each model,
you just need to click the model you want to view in details part,
and you can understand the performance of the model from all aspects.
Here we intercept a parameters, computer matrix and profiler.
In profiler, you can enter new data
to observe the change train of each variable and get the predict the result.
We see that the influence of age is not significant,
which may be countering to our combination.
Where gender, diabetes and ST_ slope are the main influence factors.
Moreover, in these results,
we pay attention to the misclassification rate,
especially is a false negative value,
because it means that the patient has heart failure,
and we predict that he does not,
which may lead to very serious consequence.
The best performance model we select in this analysis is supposed to decrease
which has the lowest the misclassification rate and the highest sensitivities.
Then we can save all the prediction formulas
and the results for use in the model comparison.
Model comparison provides more concise and intuitive format
to show model performance indicators,
which is convenient for us to make the final choice.
Now, I take through to the last part of our presentation, which is the dashboard.
Using the dashboard feature,
we created a utility where we added several important features,
which we discussed before, and which are critically affecting the heart rate.
Here we can interact by providing the inputs
Like I can choose male or female,
and we can even choose the Chest pain type.
Also, what's the ST_s lope pattern, and also, what's the excess in genome?
Based on this, I can interact with the utility,
and also based on that, it will display the probability of the heart rate,
which is a pretty useful feature.
That comes to the last part.
In conclusion, I just want to summarize.
We use the modern screen platform, and through which we explore
the best predictive modeling for the heart failure prediction.
We also leverage whatever the work we try using the JMP 16 dashboard.
Which we develop a utility to develop an interactive platform
that outputs the probability of the heart failure based on the input and parameters.
That's all we have for today.
Thank you.