Hi @maryam_nourmand,
If your objective is to evaluate model performances robustness, I can only recommend and emphasize the options brought by @dlehman1.
There are several way to evaluate model's robustness :
- By measuring model performances under various random seeds (as randomness is part of the learning in ML model : "No Learning without randomness"),
- By measuring model performances under different training and validation sets.
On the second option, the use of a Validation formula column and the "Simulate" option enable to try the model under a large number of different training conditions/sets. You can read more about how to do it in JMP in the following posts :
How can I automate and summarize many repeat validations into one output table?
Boosted Tree - Tuning TABLE DESIGN
With the results from these simulations, you can then visualize and compare performances distributions between different models, and eventually do some statistical testing if needed, for example compare mean/median/variance performances of the prediction simulations between different models.
As @dlehman1 stated, default metrics are interesting, but you might want to fine-tune them, as classification errors may not have the same "importance".
Be also careful about probabilities displayed by ML models if you intend to use them as "confidence levels", they are not all calibrated, and depending on the models chosen and the sample size, it can have different impact and consequences. More about the calibration topic :
https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html
https://ploomber.io/blog/calibration-curve/
Hope this answer and few considerations may help,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)