- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
cross validation using k-fold fit quality
I am using Lasso fit with leave one out or K-fold cross validation. Please advise the best way to view the R square and other fit quality metrics (e.g., AIC) in the output. It would be helpful to have this for the training set and the validation set (average of all hold outs).
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: cross validation using k-fold fit quality
Hi @daniel_s,
After having launched your LASSO model with K-folds crossvalidation, there are indeed several ways to proceed with the results :
- Choose the best performing LASSO model on the "best" validation fold : Not recommended as this would look like "cherry picking" and not an "honest assessment" and selection procedure : it's more like selecting the right data for the model, instead of fitting the right model to your data, so you might end up overfitting your validation data.
- After assessing results consistency and robustness, retrain the model on all data : This approach could seem logical, as once you have assessed that your model is robust and have similar results across all folds, you could be tempted to use all data to further improve your model. It could be a viable option if you're sure that model's parameters (for example, terms included and penalty value) could be kept the same between fitting with crossvalidation data and fitting with all data, to ensure your model won't be overfitting on the whole dataset. This approach has the drawback to lose sight of the model validation, so if anything goes wrong on the test data, it's hard to debug the model without validation data.
- Create model averaging of your K models : This is my preferred approach (if possible). Once you have your K models, you can run each model and save their prediction formula using "Publish Prediction Formula" to store your models in the Formula Depot :
Once the models formula in the Formula depot, you can click on the red triangle next to "Formula Depot" and select "Model Comparison". This will create a short summary of the performances of your model, and if you click on the red triangle next to "Model Comparison", you can create a Model Averaging :
This option will create a new formula in your datatable, that corresponds to the average equation of your K models (in my case, the average equation/model of my 5 individual crossvalidated models) :
You can then compare once again the performances of your K individual models and your average model using the same Model Comparison platform.
Note that this approach may not be easy/feasible to do if you have a large number of folds, and/or if the models used are complex (like Neural Networks). For simple models like regression models and Machine Learning "base models" (like Decision Tree, SVM, kNN, ...), this approach helps avoid overfitting and ensure robustness and generalization properties, without "losing" any data in validation.
You can read more about crossvalidation in the following posts :
CROSS VALIDATION - VALIDATION COLUMN METHOD
I also highly recommend the playlist "Making Friends with Machine Learning" from Cassie Kozyrkov to learn more about models training, validation and testing : Making Friends with Machine Learning - YouTube
Hope this response will help you and answer your questions,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: cross validation using k-fold fit quality
Hi @daniel_s,
Welcome in the Community !
To get performance metrics of your LASSO regression on the individual folds and on average, I think it's easier to launch the Generalized regression from the Model Screening platform, by checking only the options "Generalized Regression" and "Additional Methods", specifying the type of terms that can enter the model (for example introducing interactions and quadratic effects), the number of folds and the seed for reproducibility (if needed) :
Once the platform is launched, you'll have a new window open with all infos about individual folds and summary :
Hope this answer will help you,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: cross validation using k-fold fit quality
Thank you. That is helpful. I gather that I then would select the best fold and "run selected" in which you will then get the lasso fit results with the best K fold used as a validation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: cross validation using k-fold fit quality
Hi @daniel_s,
After having launched your LASSO model with K-folds crossvalidation, there are indeed several ways to proceed with the results :
- Choose the best performing LASSO model on the "best" validation fold : Not recommended as this would look like "cherry picking" and not an "honest assessment" and selection procedure : it's more like selecting the right data for the model, instead of fitting the right model to your data, so you might end up overfitting your validation data.
- After assessing results consistency and robustness, retrain the model on all data : This approach could seem logical, as once you have assessed that your model is robust and have similar results across all folds, you could be tempted to use all data to further improve your model. It could be a viable option if you're sure that model's parameters (for example, terms included and penalty value) could be kept the same between fitting with crossvalidation data and fitting with all data, to ensure your model won't be overfitting on the whole dataset. This approach has the drawback to lose sight of the model validation, so if anything goes wrong on the test data, it's hard to debug the model without validation data.
- Create model averaging of your K models : This is my preferred approach (if possible). Once you have your K models, you can run each model and save their prediction formula using "Publish Prediction Formula" to store your models in the Formula Depot :
Once the models formula in the Formula depot, you can click on the red triangle next to "Formula Depot" and select "Model Comparison". This will create a short summary of the performances of your model, and if you click on the red triangle next to "Model Comparison", you can create a Model Averaging :
This option will create a new formula in your datatable, that corresponds to the average equation of your K models (in my case, the average equation/model of my 5 individual crossvalidated models) :
You can then compare once again the performances of your K individual models and your average model using the same Model Comparison platform.
Note that this approach may not be easy/feasible to do if you have a large number of folds, and/or if the models used are complex (like Neural Networks). For simple models like regression models and Machine Learning "base models" (like Decision Tree, SVM, kNN, ...), this approach helps avoid overfitting and ensure robustness and generalization properties, without "losing" any data in validation.
You can read more about crossvalidation in the following posts :
CROSS VALIDATION - VALIDATION COLUMN METHOD
I also highly recommend the playlist "Making Friends with Machine Learning" from Cassie Kozyrkov to learn more about models training, validation and testing : Making Friends with Machine Learning - YouTube
Hope this response will help you and answer your questions,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: cross validation using k-fold fit quality
Thank you! Your insights were just right for the Machine learning trajectory I am on.