Discussions

maryam_nourmand · Jan 30, 2025 01:35 PM

Hello,
How can I perform a statistical test in JMP to ensure that the results obtained from different classification methods are not due to randomness after obtaining various results?

dlehman1 · Jan 30, 2025 03:59 PM

I'm not aware of any statistical tests for whether these different models vary "significantly" or not, but I don't think that is the right way to proceed. You have a number of relevant things to compare. Using the validation data (or test data if you have a large enough data set to have created that), you can look at ROC, misclassification rates, entropy and generalized R square measures, and detailed looks at the misclassifications made by each model - with appropriate adjustments in the cutoff probabilities for the classifications. What I tend to do is look closely at the prediction probability distributions that each model gives and evaluate the usefulness of each model for the problem at hand. Rarely are false positives and false negatives of equal importance, so you really want a model that works "well" for the decision problem you are modeling. And the predicted probability distributions usually provide rich evidence to use in making that evaluation.

maryam_nourmand · Jan 31, 2025 05:22 AM

how can i save r2 results in windows table?because i want do anova test on it

jthi · Jan 31, 2025 05:27 AM

Right click on the table box and select either make into data table or make combined data table

-Jarmo

Victor_G · Jan 31, 2025 7:58 AM

Hi @maryam_nourmand,

If your objective is to evaluate model performances robustness, I can only recommend and emphasize the options brought by @dlehman1.

There are several way to evaluate model's robustness :

By measuring model performances under various random seeds (as randomness is part of the learning in ML model : "No Learning without randomness"),
By measuring model performances under different training and validation sets.

On the second option, the use of a Validation formula column and the "Simulate" option enable to try the model under a large number of different training conditions/sets. You can read more about how to do it in JMP in the following posts :

How can I automate and summarize many repeat validations into one output table?

Boosted Tree - Tuning TABLE DESIGN

With the results from these simulations, you can then visualize and compare performances distributions between different models, and eventually do some statistical testing if needed, for example compare mean/median/variance performances of the prediction simulations between different models.

As @dlehman1 stated, default metrics are interesting, but you might want to fine-tune them, as classification errors may not have the same "importance".

Be also careful about probabilities displayed by ML models if you intend to use them as "confidence levels", they are not all calibrated, and depending on the models chosen and the sample size, it can have different impact and consequences. More about the calibration topic :

https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html

https://ploomber.io/blog/calibration-curve/

Hope this answer and few considerations may help,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Discussions

compare results

Re: compare results

Re: compare results

Re: compare results

Re: compare results

Recommended Articles