Hi @NishaKumar2023,
You can use K-folds cross-validation while tuning your hyperparameters, but it doesn't work exactly as you may intend to do. Here is the methodology :
- Create a K-folds validation column (either fixed or a formula depending on your objectives and reproducibility needs) : Launch the Make Validation Column Platform (jmp.com)
Make sure your data splitting in folds is representative and balanced, by using stratification, or that you do respect the constraint/duplication of your data by using grouping (same ID in the same fold for example).
- Open or create a Tuning datatable for the Boosted Tree (or any other platform) you would like to launch. I added in copy an example for the Boosted Tree, but you can find others tuning table provided by @SDF1 in this post: Malfunction in Bootstrap Forest with Tuning Design Table?
- When launching the Boosted Tree (or any other modeling platform, specify your inputs, the response to model, and use the K-folds validation column in the validation panel (here on the Mushroom JMP dataset):
- A new window pops up, you have to check "Use Tuning table" and then select your tuning table already open :
- You'll then have the results provided by the best tuned models :
Note that this method is not suited for K-folds cross-validation, as the use of tuning tables imply a partition of your data only in 3 sets :
- Training set : Used for the actual training of the model(s),
- Validation set : Used for model optimization (hyperparameter fine-tuning, features/threshold selection, ... for example) and model selection,
- Test set : Used for generalization and predictive performance assessment of the selected model on new/unseen data.
So if you specify a 5-folds crossvalidation in step 1, only the first 3 folds will be used as training, validation and test sets, not really as a 5-folds crossvalidation technique. To do a crossvalidation like you intend to do, you would need a nested crossvalidation : an inner crossvalidation to tune hyperpararameters, and an outer crossvalidation to assess robustness of the hyperparameters values found, only available in Model Screening platform and not accepting Tuning tables for hyperparameters tuning (as the goal of this platform is to screen the most promising algorithms among a large variety of model types, not finetune them) :
As far as I know, this is not (directly) possible in JMP.
But you can still use K-folds crossvalidation on "default" Boosted Tree, or try using other validation techniques, following the method above (but creating a normal formula validation column with 3 sets) and using simulation on the tuned model to assess its robustness and benefit vs. a non-tuned one :
You can see that in most cases, the default hyperparameters values work quite well, and the hyperparameters tuning help more on the performances variability (performances values like RASE, R-square, ... have narrower ranges on the tuned algorithm compared to the "default" one) than on the max or average algorithm performances.
You can check a similar post and solution here on this topic to have a look at validation techniques and the use of simulation: Solved: Re: Boosted Tree - Tuning TABLE DESIGN - JMP User Community
Nested cross-validation is typically not the first option I would recommend, as it requires a lot of computation, to fine-tune the algorithm independantly on each of the folds of the inner loop, and then calculate the performances on each validation fold of the outer loop.
K-folds crossvalidation is a useful technique in the absence of large quantity of data, but the nested cross-validation still requires a quite large amount to correctly do the data splitting: for example, if you split the inner loop in 4 folds and the outer loop in 5 folds, it requires to create 20 folds/groups in your dataset !
Finally, crossvalidation is more a tuning technique than a validation technique to assess model robustness, as brillantly described by Cassie Kozyrkov in this video : https://youtu.be/zqD0lQy_w40?si=lja79_aik0KO-jbB
Hope this answer will help you,
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)