cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Is there a way to do k-fold cross validation with boosted tree?

Is there a way to do k-fold cross validation with the boosted tree platform?

While I understand this can be done in model screening, I need to hypertune with the k-fold that provides the maximum performance. So, instead of using model screening, is there a way to do k-fold cross validation with boosted tree predictive modeling where I can also hypertune other parameters?

 

Thank you for taking the time to get back to me.

 

Nisha

2 ACCEPTED SOLUTIONS

Accepted Solutions
shampton82
Level VII

Re: Is there a way to do k-fold cross validation with boosted tree?

I'd try using the XGBoost app if you'd like to use K-fold and boosted tree.

XGBoost Add-In for JMP Pro - JMP User Community

 

Steve

View solution in original post

Victor_G
Super User

Re: Is there a way to do k-fold cross validation with boosted tree?

Hi @NishaKumar2023,

 

You can use K-folds cross-validation while tuning your hyperparameters, but it doesn't work exactly as you may intend to do. Here is the methodology :

  1. Create a K-folds validation column (either fixed or a formula depending on your objectives and reproducibility needs) : Launch the Make Validation Column Platform (jmp.com)
    Make sure your data splitting in folds is representative and balanced, by using stratification, or that you do respect the constraint/duplication of your data by using grouping (same ID in the same fold for example).
  2. Open or create a Tuning datatable for the Boosted Tree (or any other platform) you would like to launch. I added in copy an example for the Boosted Tree, but you can find others tuning table provided by @SDF1 in this post: Malfunction in Bootstrap Forest with Tuning Design Table? 
  3. When launching the Boosted Tree (or any other modeling platform, specify your inputs, the response to model, and use the K-folds validation column in the validation panel (here on the Mushroom JMP dataset): 
    Victor_G_0-1723533425820.png
  4. A new window pops up, you have to check "Use Tuning table" and then select your tuning table already open : 
    Victor_G_1-1723533477035.png
  5. You'll then have the results provided by the best tuned models :
    Victor_G_2-1723533617587.png

Note that this method is not suited for K-folds cross-validation, as the use of tuning tables imply a partition of your data only in 3 sets :

  1. Training set : Used for the actual training of the model(s),
  2. Validation set : Used for model optimization (hyperparameter fine-tuning, features/threshold selection, ... for example) and model selection,
  3. Test set : Used for generalization and predictive performance assessment of the selected model on new/unseen data.

So if you specify a 5-folds crossvalidation in step 1, only the first 3 folds will be used as training, validation and test sets, not really as a 5-folds crossvalidation technique. To do a crossvalidation like you intend to do, you would need a nested crossvalidation : an inner crossvalidation to tune hyperpararameters, and an outer crossvalidation to assess robustness of the hyperparameters values found, only available in Model Screening platform and not accepting Tuning tables for hyperparameters tuning (as the goal of this platform is to screen the most promising algorithms among a large variety of model types, not finetune them) :

Victor_G_3-1723534064034.png

 

As far as I know, this is not (directly) possible in JMP.

But you can still use K-folds crossvalidation on "default" Boosted Tree, or try using other validation techniques, following the method above (but creating a normal formula validation column with 3 sets) and using simulation on the tuned model to assess its robustness and benefit vs. a non-tuned one :

Victor_G_4-1723535038590.png

You can see that in most cases, the default hyperparameters values work quite well, and the hyperparameters tuning help more on the performances variability (performances values like RASE, R-square, ... have narrower ranges on the tuned algorithm compared to the "default" one) than on the max or average algorithm performances. 

 

You can check a similar post and solution here on this topic to have a look at validation techniques and the use of simulation: Solved: Re: Boosted Tree - Tuning TABLE DESIGN - JMP User Community 

 

Nested cross-validation is typically not the first option I would recommend, as it requires a lot of computation, to fine-tune the algorithm independantly on each of the folds of the inner loop, and then calculate the performances on each validation fold of the outer loop.
K-folds crossvalidation is a useful technique in the absence of large quantity of data, but the nested cross-validation still requires a quite large amount to correctly do the data splitting: for example, if you split the inner loop in 4 folds and the outer loop in 5 folds, it requires to create 20 folds/groups in your dataset !
Finally, crossvalidation is more a tuning technique than a validation technique to assess model robustness, as brillantly described by Cassie Kozyrkov in this video : https://youtu.be/zqD0lQy_w40?si=lja79_aik0KO-jbB 

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Cross-validation is much simpler than it sounds... this video will show you exactly how it works. Be sure to check out the rest of the MFML course playlist here: http://bit.ly/machinefriend If you prefer to learn in bulk, you can get access to longer chunks of the course by signing up for the ...
4 REPLIES 4
shampton82
Level VII

Re: Is there a way to do k-fold cross validation with boosted tree?

I'd try using the XGBoost app if you'd like to use K-fold and boosted tree.

XGBoost Add-In for JMP Pro - JMP User Community

 

Steve

Re: Is there a way to do k-fold cross validation with boosted tree?

@shampton82Thank you so much, this allowed me to create the k fold (val) column and run the boosted tree!

Victor_G
Super User

Re: Is there a way to do k-fold cross validation with boosted tree?

Hi @NishaKumar2023,

 

You can use K-folds cross-validation while tuning your hyperparameters, but it doesn't work exactly as you may intend to do. Here is the methodology :

  1. Create a K-folds validation column (either fixed or a formula depending on your objectives and reproducibility needs) : Launch the Make Validation Column Platform (jmp.com)
    Make sure your data splitting in folds is representative and balanced, by using stratification, or that you do respect the constraint/duplication of your data by using grouping (same ID in the same fold for example).
  2. Open or create a Tuning datatable for the Boosted Tree (or any other platform) you would like to launch. I added in copy an example for the Boosted Tree, but you can find others tuning table provided by @SDF1 in this post: Malfunction in Bootstrap Forest with Tuning Design Table? 
  3. When launching the Boosted Tree (or any other modeling platform, specify your inputs, the response to model, and use the K-folds validation column in the validation panel (here on the Mushroom JMP dataset): 
    Victor_G_0-1723533425820.png
  4. A new window pops up, you have to check "Use Tuning table" and then select your tuning table already open : 
    Victor_G_1-1723533477035.png
  5. You'll then have the results provided by the best tuned models :
    Victor_G_2-1723533617587.png

Note that this method is not suited for K-folds cross-validation, as the use of tuning tables imply a partition of your data only in 3 sets :

  1. Training set : Used for the actual training of the model(s),
  2. Validation set : Used for model optimization (hyperparameter fine-tuning, features/threshold selection, ... for example) and model selection,
  3. Test set : Used for generalization and predictive performance assessment of the selected model on new/unseen data.

So if you specify a 5-folds crossvalidation in step 1, only the first 3 folds will be used as training, validation and test sets, not really as a 5-folds crossvalidation technique. To do a crossvalidation like you intend to do, you would need a nested crossvalidation : an inner crossvalidation to tune hyperpararameters, and an outer crossvalidation to assess robustness of the hyperparameters values found, only available in Model Screening platform and not accepting Tuning tables for hyperparameters tuning (as the goal of this platform is to screen the most promising algorithms among a large variety of model types, not finetune them) :

Victor_G_3-1723534064034.png

 

As far as I know, this is not (directly) possible in JMP.

But you can still use K-folds crossvalidation on "default" Boosted Tree, or try using other validation techniques, following the method above (but creating a normal formula validation column with 3 sets) and using simulation on the tuned model to assess its robustness and benefit vs. a non-tuned one :

Victor_G_4-1723535038590.png

You can see that in most cases, the default hyperparameters values work quite well, and the hyperparameters tuning help more on the performances variability (performances values like RASE, R-square, ... have narrower ranges on the tuned algorithm compared to the "default" one) than on the max or average algorithm performances. 

 

You can check a similar post and solution here on this topic to have a look at validation techniques and the use of simulation: Solved: Re: Boosted Tree - Tuning TABLE DESIGN - JMP User Community 

 

Nested cross-validation is typically not the first option I would recommend, as it requires a lot of computation, to fine-tune the algorithm independantly on each of the folds of the inner loop, and then calculate the performances on each validation fold of the outer loop.
K-folds crossvalidation is a useful technique in the absence of large quantity of data, but the nested cross-validation still requires a quite large amount to correctly do the data splitting: for example, if you split the inner loop in 4 folds and the outer loop in 5 folds, it requires to create 20 folds/groups in your dataset !
Finally, crossvalidation is more a tuning technique than a validation technique to assess model robustness, as brillantly described by Cassie Kozyrkov in this video : https://youtu.be/zqD0lQy_w40?si=lja79_aik0KO-jbB 

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Cross-validation is much simpler than it sounds... this video will show you exactly how it works. Be sure to check out the rest of the MFML course playlist here: http://bit.ly/machinefriend If you prefer to learn in bulk, you can get access to longer chunks of the course by signing up for the ...

Re: Is there a way to do k-fold cross validation with boosted tree?

Thank you so much @Victor_G ! I found your explanation to be very help esp for the next step in hypertuning, I was able to create the k fold val column via xgboost and run the boosted tree but I will directions and resources to be helpful for the next few steps. Thank you!