Training and Validation R2s in Lasso/Elastic net

MarkusJH · Jun 8, 2023 9:40 AM

Dear all,

I use k-fold cross-validation in combination with penalized regression models (lasso, elastic net) and want to examine the (averaged) R2 values of the training/validation sets to check for overfitting and explained variance. I already searched around with the many options, but I can not find that. Is there a way to show this? It would be great, if that could be shown in the overview table, so that I can use it for model selection.

Thank you, Markus

Victor_G · Jan 30, 2023 03:40 AM

Hi @MarkusJH,

Welcome in the Community !

To answer your question, one option could be to use the platform "Model Screening" available in menu "Analyze", "Predictive Modeling".

Once in the "Model Screening" platform, you can deselect methods other than "Generalized Regression" and "Additional Methods" (here done on the Titanic dataset), and specify if you want to include 2 factors interactions or quadratic terms in the model. You can also specify the number of folds for cross-validation and a random seed if you want reproducible results :

When the analysis is done, you can see the validation results as a summary across the folds. This summary table can help you select the most appropriate model(s) (you can select the model(s) you want to see in more details (or click on "Select Dominant") and then click on "Run Selected" to open the platform of Generalized Regression to have more details about the selected model(s)):

And you can also have a look at individual results by folds (if needed) by opening the menu "Training" and "Validation" (here on "Training", folds identifier can be seen in the last column for each type of model trained):

I hope I understood your question and that this answer will help you.
If I misunderstand something and this was not what you expect, could you provide more details on which platform do you currently use, what info do you already have and which ones are missing ?

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

MarkusJH · Nov 24, 2023 08:15 AM

Thank you, but of course I know the model screening platform, but it runs simplified models and it does not provide the options of the full models. For instance, I assume that the neural model in model screening is a rather simple one with only a few hidden units.

When I want to do nested cross-validation with a more complex neural model with 2 layers, for instance, this can not be done with the model screening, isn't it?

There also other things that may be done within in the inner loop only. For instance, imputation is often done within the inner loop only, because it would be unfair to use the whole sample including the independent test sample for imputation.

Therefore, I am looking for a script, with which nested cross-validation can be done. I could then, for instance, insert imputation within the inner loop only.

MarkusJH · Nov 24, 2023 08:57 AM

... or another thing, one wants to do in the inner loop would be hyperparameter selection, e.g. the number of hidden units.

Of course, at the end of the inner loop, there must be a criterion that defines the best model, which is then submitted to the outer loop cross-validation.

gemmahodgson · Aug 23, 2024 05:31 AM

I asked JMP a similar question about Elastic Net Regression adn how I could get our some form of R-Squared and the reply was that it is not give for KFold or Leave-One-Out validation methods, only for AICc and BIC. So I think you have to calculate it manually yourself. I think it's quite annoying as SAS can do this.

scott_allen · Aug 23, 2024 01:52 PM

@MarkusJH

You can compare the R2 values across k folds using the Model Comparison platform in the Analyze > Predictive Modeling menu. Follow these steps:

Create a K Fold column using the Make Validation Column utility and select "Make K Fold Column" (in the bottom left of the dialog window).
Launch Gen Reg and load your KFold column as the Validation role.
Run your regression model (Elastic Net, Lasso, etc.) and make sure that Validation Column is selected as the Validation method.
When you are satisfied with the model, save the Prediction Formula to the data table.
To view the R2 value for each fold, launch Model Comparison from the Analyze > Predictive Modeling menu
Load your saved formula column as the Y and the KFold Validation column as the Group
The resulting report will show you the measures of fit for each Fold in the column:

-Scott

Training and Validation R2s in Lasso/Elastic net

Re: Training and Validation R2s in Lasso/Elastic net

Re: Training and Validation R2s in Lasso/Elastic net

Re: Training and Validation R2s in Lasso/Elastic net

Re: Training and Validation R2s in Lasso/Elastic net

Re: Training and Validation R2s in Lasso/Elastic net