Re: Refitting a lasso regression at single penalty

Report Inappropriate Content · Jun 8, 2023 5:44 PM

Scenario: I use validation data to tune my lasso penalty and now I want to refit to all of my data (training + validation) specifying that optimized penalty term.

I can't see anyway to do that within the generalized regression platform. What is recommended here?

SDF1 · Jan 24, 2022 01:05 PM

Hi @Mittman ,

Although I'm not sure what the actual underlying algorithm is, there is only very limited tuning done on the Lasso modeling.by L1 penalization. It's pretty good at shrinking unimportant factors, but isn't all that great if you have multicollinearity between factors and response. Something to keep in mind is that when you use cross-validation (using the validation column), the algorithm uses that to then start minimizing the estimates for the factors. This helps the algorithm to drive unimportant factors to 0 and simplifies the model. If you re-run (even using the same grid spacing/spacing type, and penalty fraction), you will end up with a different model because it will use a different validation method to simplify the model. If you want to compare models fit with your validation column vs ones using AICc or K-fold, with each model you generate, you can go to the red hot button next to the fit and select Save Columns > Save Prediction Formula. You can then compare how well different models fit the response, e.g. under Model Comparison platform.

Some other thoughts:

1. You might consider doing the Elastic Net as it combines both L1 (Lasso) and L2 (Ridge) penalized methods. Kind of the best of both worlds.

2. You might want to run some bootstrap or simulations on the estimates for the factors to determine their magnitude and if they a really are important or not. Be sure to compare the magnitude to the Std Error of the estimate to see if the error is larger than the estimate itself.

3. If you're working on building models, it's a good idea to run through other modeling platforms in JMP to see which model fits better. It could be that a KNN or SVM, or NN model actually outperforms the other models in terms of predictive capability. Be sure to look at some of the other modeling platforms like SLS, GenLinReg, etc. You never know which model will end up outperforming the others.

4. If you're using a validation column for cross validation, you might consider how you're partitioning the data into training and validation, as this can affect model performance.

5. If your data comes from a DOE, and you're using the GenReg platform, as a rule of thumb, it is best to limit yourself to the first five estimation methods in the GenReg platform.

Hope this helps!,

DS

Mittman · Jan 24, 2022 06:25 PM

The scenario that I'm interested in is where I am using a fixed data partition (validation column) and have run a penalized regression (regardless of lasso, ridge or elastic net). These give me a solution path and an optimized model based on the scaled log-likelihood of the validation data. The tuning parameter that is returned could be used to (quickly) fit an updated model using additional training data at a prespecified value of the tuning parameter. Is there a way to run lasso (or ridge or elastic net) without generating an entire solution path?

Dan_Obermiller · Jan 24, 2022 09:00 PM

Essentially, it sounds like you are asking to score observations using your chosen model. Saving the Prediction Formula as DS mentioned will show you the predictions for all of the observations, both training and validation sets. If you are interested in scoring NEW observations, then you should Publish your prediction formula to the Formula Depot. Then, when you open up your new data table, you can run the formula script on the new table. That will score all of the new observations.

Alternatively, you could cut the prediction formula from your data table and paste it into a new formula column in the new table.

Dan Obermiller

Mittman · Feb 1, 2022 04:14 PM

Let me try again. Lasso/Ridge/Elastic Net are a penalized regression methods that fit a linear model to data by minimizing the sum of squared residuals plus a penalty term proportional to a function of the magnitudes of the regression coefficients. The behavior in JMP is to fit a series of models where the weight of the penalty term varies over a grid and the optimal penalty is selected by a chosen criteria.

My question is this: if I know what penalty weight I want to use, is there a way to directly fit new data with that penalty weight? This may be useful if I want to fit a larger data set with a penalty that I have previously determined and am not interested in optimizing that penalty weight again. It would see that this would be possible by setting the number of grid points to 1 and specifying a "minimum penalty fraction" based on the desired penalty weight, but that doesn't work (grid size of 1 is not accepted) and how to convert the lambda penalty to "minimum penalty fraction" is not clear from the documentation. Screenshot 2022-02-01 130504.jpg

Refitting a lasso regression at single penalty