Model Screening in JMP Pro 16

mia_stephens · Oct 28, 2020 01:49 PM

Predictive modeling is all about finding the model that accurately predicts the outcome of interest. Let’s say, for example, that I want to predict the progression of a particular disease – in this case, diabetes – after one year, based on a number of factors. More specifically, I want to identify patients who are most likely to have high disease progression so that appropriate interventions can be made.

I have JMP Pro, and I know that there are several possible predictive models I can fit. For example, I can fit a regression model, various types of tree-based models, or a neural network, to name a few. Which modeling platform should I use? Which type of model will predict the outcome most accurately? For any given modeling situation, the best model depends largely on the data – there’s no one type of model that works best for all problems.

To find the best model for the diabetes scenario, I can fit all of the possible models, one at a time, using cross-validation. I can save the individual models to the data table, or to the Formula Depot, and then use Model Comparison to compare the performance of the models on the validation set to select the best one. Or I can select the best combination of models. While this method more than gets the job done, it does require a bit of work.

Now, with the new Model Screening platform in JMP Pro 16, we’ve made this much easier. Model Screening provides an efficient workflow for simultaneously fitting, comparing, exploring, selecting and then deploying the best predictive model.

To illustrate, I’ll use the Diabetes data from the JMP Sample Data library. The response, or outcome variable, is Y Binary, which has two possible values: Low (for low disease progression) and High. The predictors, or factors, are Age through Glucose.

1 Diabetes data.png

I select Model Screening from the Analyze > Predictive Modeling menu and specify the response and factors. The Validation column partitions the data into Training and Validation sets, so I’ll use this for model validation.

The top portion of the Model Screening dialog is shown here:

Model Dialog.png

At the bottom of the dialog, there are several different modeling methods to choose from, along with many different fitting and cross-validation options. Note that XGBoost is an available method, because I installed the XGBoost Add-In from the JMP User Community.

I’ll use the default models, but will add two-way interactions and will set the random seed to 11111 to make my results reproducible.

3 Model Screening Dialog Bottom.png

All of the models that make sense for these data are fit, and several validation statistics are provided. The models are sorted in descending order of Generalized RSquare, but to find the model or models that stand out across all of the validation statistics, I’ll click Select Dominant.

I can see that the top models, overall, are Neural Boosted and Generalized Regression Lasso with two-way interactions.

Select Dominant.png

To explore the classifications for these models, I’ll select Decision Threshold from the top red triangle menu. This option provides a wealth of information for comparing the classification accuracy of the selected models. The top section shows the breakdown of true and false classifications for each model, and if I change the probability threshold (either by dragging the slider line or using the edit box), I can see how the classifications and classification errors change for different cutoffs.

Decision Threshold.mp4

Video Player is loading.

Current Time 0:00

Duration 0:20

Loaded: 0.00%

Stream Type LIVE

Remaining Time 0:20

(view in My Videos)

The bottom section has tabs showing different classification graphs and metrics. The Metrics tab for the validation data is shown here. In this scenario, I want to identify patients who are most likely to have a high rate of diabetes progression. So, I want to find the model that results in the lowest false negative rate. At a probability threshold of 0.42, the Generalized Regression model has a low false negative rate, while still maintaining a low false positive rate relative to the other model.

5 Decision Threshold.png

Now, I might want to take a closer look at this model. Model Screening makes this seamless. I can easily run the selected model in a new window by clicking Run Selected.

Run Selected.png Or, without leaving Model Screening, I can see the results directly in the Details outline as shown here. In Model Screening I have access to all of the options I'd see in Generalized Regression. So, for example, I can interact with the solution path, and I can save or publish the model for deployment. I can even run other models to see if I can get further improvement in model accuracy.

7 Gen Reg Solution Path.png

I've only touched the surface of Model Screening here, but hopefully you can see how this streamlined workflow can help you save a lot of time and effort! You simply need to specify the variables in your model, select the models you want to fit and the validation method, and Model Screening does the rest!

To learn more about fitting and validating predictive models in JMP Pro, go to the key features page at our website.

What Customers Say

"Having used model screening on JMP Pro 16 beta these past few months, it has already become an indispensable part of my coral reef predictive modeling research. Although models could be compared rather easily in earlier versions of JMP Pro (e.g., JMP Pro 15’s “model comparison” platform), this new tool makes it even easier for users to simultaneously test many different models (e.g., gen-reg, PLS, discriminant, neural networks, etc.) with the same data set. Something that once took me hours or even days can now be done in minutes (or even seconds). Wow, what a great feature! This surely must be the biggest improvement from version 15 to 16, and I’m sure many others will feel the same."

- Anderson Mayfield, National Oceanic and Atmospheric Administration (NOAA) and the University of Miami

utkcito · ‎11-15-2020

when is JMP pro 16 going to be launched?

Thanks,

Uriel

mia_stephens · ‎11-18-2020

It will be launched in March, 2021.

MannyUy · ‎03-23-2021

I used to do essentially the same thing but using one model platform at a time. It took a lot of time. This new capability truly saves time. Thank you!

mia_stephens · ‎03-23-2021

Hi @MannyUy, yes, it's a real time-saver!

PatrickGiuliano · ‎05-06-2021

@mia_stephens this is super cool, I'm intrigued by the "Decision Thresholds" feature and am trying to understand it a bit better. In your example here, did you essentially pick the "intersection" on the Misclassification Count vs Prob Threshold for High Curves (the point i circled in black in the attach image) in making the determination that the Probability Threshold of 0.42 would be a good choice for minimizing false negatives (while maintaining the false positive rate at about the same level?)

I used JMP 16 EA8 and ran almost the same setup except that I specified 'XGBoost' as a model fitting option in the model screening dialogue, (kept the same random seed at 1111).

Maybe one useful feature here would be to "remember settings" in a similar way as this can be done in the Prediction Profiler (where it generates a radio button for reach saved setting) so you can easily toggle between different Probability Thresholds to directly see/compare the effect at the specific Thresholds you are interested in. Example image attached.

mia_stephens · ‎05-07-2021

HI @PatrickGiuliano, this was a relatively superficial example, and I settled on 0.42 as a threshold for illustration purposes. The goal here was to find a cutoff that led to a low false negative rate while maintaining a low false positive rate. But, what constitutes acceptable rates? This depends largely on the research situation or problem, and the "costs" of a particular type of error. For a particular situation, you might be willing to accept a higher false positive rate if it results in a better false negative rate. The graphs can help to find the sweet spot, to balance the two rates.

This is a small data set, so the results are fairly granular. But, in the second graph here, I see that the false positive rate for neural boosted starts to grow quickly after around 0.45 or 0.5, and the false negative rate drops quickly until around that point, and starts to level out somewhat.

The idea of adding a "remembered settings" option is compelling. Would you be willing to post this to the JMP Wish List?

Thanks!

Mia

PatrickGiuliano · ‎05-10-2021

Hi @mia_stephens Thanks for your additional clarification! I agree that this is a tradeoff problem that largely depends on subject matter expertise and product/user application and risk. I see how the graphs are getting at this tradeoff, although the lines in the trajectory crisscrossing in both directions can be a bit confusing at first I think (the solid lines in the increasing direction, the dashed lines in the decreasing). I'll add the topic idea to the wish list! @PatrickGiuliano

sukrit2020 · ‎10-29-2021

How can I visualize MAE in this platform?

mia_stephens · ‎10-29-2021

Thanks for the question @sukrit2020. MAE is currently not available from the Model Screening platform.