Hi @mjz5448,
Justo to give some explanations about the differences between LASSO and Ridge (Elastic Net being a mix of LASSO and Ridge penalizations) :
- LASSO has the ability to set the coefficients for features it does not consider "interesting" to zero. This means that the model does some "automatic feature selection" to decide which features should and should not be included on its own. This property has the advantage to reduce complexity of the model (so it helps avoid overfitting). However, coefficients are biased by this penalization (shrinkage of coefficients) and LASSO regression can be "unstable" when trained on data with correlated features : one of the features gets selected somewhat arbitrarily and all of the other features that are highly correlated with that feature get effectively dropped from the model. This may lead someone to erroneously conclude that only the feature that was selected to remain in the model is important, when in reality some of the other features may be just as important or even more important.
LASSO is robust to outliers, and tends to be most effective when there is a small subset of variables with a strong effect on the response among many other variables with small or no effects.
- Ridge has the ability to minimize coefficients for correlated features (close to 0 but not 0). This property enables Ridge regression to be used on datasets with many correlated features, as the negative impact of correlated features is minimized, and enables to reduce overfitting. As a penalization is introduced, you'll also get biased coefficient estimates.
Ridge enables to consider all features, and tends to be most effective when there are a large number of variables with large and comparable effects.
- Elastic Net combines both penalization methods, so it shrinks some coefficients to 0 and minimize others, so it can be a good compromise to do feature selection as well as handling correlated features with similar importance on the response.
You can learn more about Generalized (and penalized) regression models in this Mastering : Using Generalized Regression in JMP® Pro to Create Robust Linear Models
For Feature selection, I would highly recommend looking at feature importances from a Random Forest, since this model doesn't require extensive finetuning, is able to handle correlated features, and isn't sensitive to overfitting, so it may provide a good benchmark comparison.
I'm not entirely sure what is your objective with this model since you mention validation data : system understanding (explainative objective) or predictive objective ?
Is it validation data (used for model optimization (hyperparameter fine-tuning, features/threshold selection, ...) and model selection), or test data (used for generalization and predictive performance assessment of the selected model on new/unseen data) ?
- In the first case (validation data), you could try to use this validation data as your validation method in the Generalized Regression launch panel, to make sure the penalization is correctly set and fixed on your dataset, and compare the different models on this validation data : penalized regression methods, Random Forest, ...
- In the second case (test data), I would recommend to not use this dataset until you have compared the models and select the most promising one. Depending on your main objective (system understanding or prediction) and the method to validate your model (information criterion like AICc or BIC, or K-folds/Holdback/Leave-One-Out validation if you are interested about predictive performances), you can choose an appropriate validation method and compare the models outcomes on your training/validation data to select the most interesting model. Once chosen, you can confirm its generalization and predictive performances on the test set (unseen data).
Hope this answer will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)