Solved: Is there a way to do k-fold cross validation with boosted tree?

NishaKumar2023 · Aug 12, 2024 02:46 PM

Is there a way to do k-fold cross validation with the boosted tree platform?

While I understand this can be done in model screening, I need to hypertune with the k-fold that provides the maximum performance. So, instead of using model screening, is there a way to do k-fold cross validation with boosted tree predictive modeling where I can also hypertune other parameters?

Thank you for taking the time to get back to me.

Nisha

shampton82 · Aug 12, 2024 04:49 PM

I'd try using the XGBoost app if you'd like to use K-fold and boosted tree.

XGBoost Add-In for JMP Pro - JMP User Community

Steve

View solution in original post

Victor_G · Aug 13, 2024 1:27 AM

Hi @NishaKumar2023,

You can use K-folds cross-validation while tuning your hyperparameters, but it doesn't work exactly as you may intend to do. Here is the methodology :

Create a K-folds validation column (either fixed or a formula depending on your objectives and reproducibility needs) : Launch the Make Validation Column Platform (jmp.com)
Make sure your data splitting in folds is representative and balanced, by using stratification, or that you do respect the constraint/duplication of your data by using grouping (same ID in the same fold for example).
Open or create a Tuning datatable for the Boosted Tree (or any other platform) you would like to launch. I added in copy an example for the Boosted Tree, but you can find others tuning table provided by @SDF1 in this post: Malfunction in Bootstrap Forest with Tuning Design Table?
When launching the Boosted Tree (or any other modeling platform, specify your inputs, the response to model, and use the K-folds validation column in the validation panel (here on the Mushroom JMP dataset):
A new window pops up, you have to check "Use Tuning table" and then select your tuning table already open :
You'll then have the results provided by the best tuned models :

Note that this method is not suited for K-folds cross-validation, as the use of tuning tables imply a partition of your data only in 3 sets :

Training set : Used for the actual training of the model(s),
Validation set : Used for model optimization (hyperparameter fine-tuning, features/threshold selection, ... for example) and model selection,
Test set : Used for generalization and predictive performance assessment of the selected model on new/unseen data.

So if you specify a 5-folds crossvalidation in step 1, only the first 3 folds will be used as training, validation and test sets, not really as a 5-folds crossvalidation technique. To do a crossvalidation like you intend to do, you would need a nested crossvalidation : an inner crossvalidation to tune hyperpararameters, and an outer crossvalidation to assess robustness of the hyperparameters values found, only available in Model Screening platform and not accepting Tuning tables for hyperparameters tuning (as the goal of this platform is to screen the most promising algorithms among a large variety of model types, not finetune them) :

As far as I know, this is not (directly) possible in JMP.

But you can still use K-folds crossvalidation on "default" Boosted Tree, or try using other validation techniques, following the method above (but creating a normal formula validation column with 3 sets) and using simulation on the tuned model to assess its robustness and benefit vs. a non-tuned one :

You can see that in most cases, the default hyperparameters values work quite well, and the hyperparameters tuning help more on the performances variability (performances values like RASE, R-square, ... have narrower ranges on the tuned algorithm compared to the "default" one) than on the max or average algorithm performances.

You can check a similar post and solution here on this topic to have a look at validation techniques and the use of simulation: Solved: Re: Boosted Tree - Tuning TABLE DESIGN - JMP User Community

Nested cross-validation is typically not the first option I would recommend, as it requires a lot of computation, to fine-tune the algorithm independantly on each of the folds of the inner loop, and then calculate the performances on each validation fold of the outer loop.
K-folds crossvalidation is a useful technique in the absence of large quantity of data, but the nested cross-validation still requires a quite large amount to correctly do the data splitting: for example, if you split the inner loop in 4 folds and the outer loop in 5 folds, it requires to create 20 folds/groups in your dataset !
Finally, crossvalidation is more a tuning technique than a validation technique to assess model robustness, as brillantly described by Cassie Kozyrkov in this video : https://youtu.be/zqD0lQy_w40?si=lja79_aik0KO-jbB

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

shampton82 · Aug 12, 2024 04:49 PM

I'd try using the XGBoost app if you'd like to use K-fold and boosted tree.

XGBoost Add-In for JMP Pro - JMP User Community

Steve

NishaKumar2023 · Aug 16, 2024 11:41 AM

@shampton82Thank you so much, this allowed me to create the k fold (val) column and run the boosted tree!

Victor_G · Aug 13, 2024 1:27 AM

Hi @NishaKumar2023,

You can use K-folds cross-validation while tuning your hyperparameters, but it doesn't work exactly as you may intend to do. Here is the methodology :

Create a K-folds validation column (either fixed or a formula depending on your objectives and reproducibility needs) : Launch the Make Validation Column Platform (jmp.com)
Make sure your data splitting in folds is representative and balanced, by using stratification, or that you do respect the constraint/duplication of your data by using grouping (same ID in the same fold for example).
Open or create a Tuning datatable for the Boosted Tree (or any other platform) you would like to launch. I added in copy an example for the Boosted Tree, but you can find others tuning table provided by @SDF1 in this post: Malfunction in Bootstrap Forest with Tuning Design Table?
When launching the Boosted Tree (or any other modeling platform, specify your inputs, the response to model, and use the K-folds validation column in the validation panel (here on the Mushroom JMP dataset):
A new window pops up, you have to check "Use Tuning table" and then select your tuning table already open :
You'll then have the results provided by the best tuned models :

Note that this method is not suited for K-folds cross-validation, as the use of tuning tables imply a partition of your data only in 3 sets :

Training set : Used for the actual training of the model(s),
Validation set : Used for model optimization (hyperparameter fine-tuning, features/threshold selection, ... for example) and model selection,
Test set : Used for generalization and predictive performance assessment of the selected model on new/unseen data.

So if you specify a 5-folds crossvalidation in step 1, only the first 3 folds will be used as training, validation and test sets, not really as a 5-folds crossvalidation technique. To do a crossvalidation like you intend to do, you would need a nested crossvalidation : an inner crossvalidation to tune hyperpararameters, and an outer crossvalidation to assess robustness of the hyperparameters values found, only available in Model Screening platform and not accepting Tuning tables for hyperparameters tuning (as the goal of this platform is to screen the most promising algorithms among a large variety of model types, not finetune them) :

As far as I know, this is not (directly) possible in JMP.

But you can still use K-folds crossvalidation on "default" Boosted Tree, or try using other validation techniques, following the method above (but creating a normal formula validation column with 3 sets) and using simulation on the tuned model to assess its robustness and benefit vs. a non-tuned one :

You can see that in most cases, the default hyperparameters values work quite well, and the hyperparameters tuning help more on the performances variability (performances values like RASE, R-square, ... have narrower ranges on the tuned algorithm compared to the "default" one) than on the max or average algorithm performances.

You can check a similar post and solution here on this topic to have a look at validation techniques and the use of simulation: Solved: Re: Boosted Tree - Tuning TABLE DESIGN - JMP User Community

Nested cross-validation is typically not the first option I would recommend, as it requires a lot of computation, to fine-tune the algorithm independantly on each of the folds of the inner loop, and then calculate the performances on each validation fold of the outer loop.
K-folds crossvalidation is a useful technique in the absence of large quantity of data, but the nested cross-validation still requires a quite large amount to correctly do the data splitting: for example, if you split the inner loop in 4 folds and the outer loop in 5 folds, it requires to create 20 folds/groups in your dataset !
Finally, crossvalidation is more a tuning technique than a validation technique to assess model robustness, as brillantly described by Cassie Kozyrkov in this video : https://youtu.be/zqD0lQy_w40?si=lja79_aik0KO-jbB

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

NishaKumar2023 · Aug 16, 2024 11:43 AM

Thank you so much @Victor_G ! I found your explanation to be very help esp for the next step in hypertuning, I was able to create the k fold val column via xgboost and run the boosted tree but I will directions and resources to be helpful for the next few steps. Thank you!

Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?

Re: Is there a way to do k-fold cross validation with boosted tree?