cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Fitting a Penalized Regression (Lasso) Model

Started ‎06-10-2020 by
Modified ‎12-03-2021 by
View Fullscreen Exit Fullscreen

Learn more in our free online course:
Statistical Thinking for Industrial Problem Solving

In this video, we show how to fit a penalized regression model using generalized regression in JMP Pro. We use the file Bodyfat 07.jmp and fit a model for %Fat using the Lasso with validation.

 

First, we fit a linear regression model for %Fat.

 

To do this, we select Fit Model from the Analyze menu. We select %Fat for the Y role, select all of the predictors as model effects, and use the validation column in the Validation role.

 

Then we select Generalized Regression from the menu for Personality.

 

The default response distribution is Normal.  We’ll run this model with the normal distribution.

 

You can see that we have fit a standard least squares regression model. When we look at the Parameter Estimates table, you can see that three of the predictors are significant.

 

However, let’s look at the VIFs, or variance inflation factors, to see whether there is an issue with multicollinearity.

 

To do this, we right-click over the Parameter Estimates table and select Columns and then VIF. You can see that many of these values are greater than 10, and two of the values are greater than 100. This tells us that multicollinearity is indeed a problem with these data.

 

When you have correlated predictors, you can use a penalized estimation method instead of least squares regression.

 

The default method is the Lasso, which is short for Least Absolute Shrinkage and Selection Operator. The Lasso applies a penalty to shrink the parameter estimates. Because it can shrink the estimates to zero, the Lasso can also be used for variable selection.

 

There are many options available for the Lasso, but we’ll run the model with the defaults.

 

We used a validation column when we specified the model, so Validation Column is the default validation method.

 

We’ll click Go to run this model.

 

The best model, selected by the Lasso, is shown. This is the model with the optimum value of the penalty parameter, Lambda.

 

Let’s start with all of the terms in the model. This is the same as our least squares model. To do this, we drag the red line in the parameter estimates solution path all the way to the right.

 

The solution path shows the magnitude of the centered and scaled parameter estimates. To show these estimates, we select Regression Reports from the red triangle for the model, and then Parameter Estimates for Centered and Scaled Predictors.

 

The top line is for the parameter estimate for abdomen, which is 117.87.

 

The second line from the top is BMI. This estimate is 88.9.

 

As we drag the red line from right to left, we increase the value of the penalty parameter, Lambda. You can see that, as we do this, many of the parameter estimates become smaller. That is, they shrink.

 

For example, the estimate for BMI has shrunk from 88.9 to 51.25. The parameter estimate for weight has shrunk to zero. This term is removed from the model.

 

Here, a large penalty has been applied. Everything has been removed from the model except height and abdomen.

 

How do you determine when to stop removing terms from the model? That is, how do we determine the optimal value of the penalty parameter?

 

This model was built using a validation column, so JMP uses a statistic called the scaled negative log-likelihood.

 

The model is built using the data in the training set. The best model is the one that has the minimum value of the scaled negative log-likelihood for the validation set.

 

This model includes several parameters. The estimates for weight, BMI, thigh, and knee have been removed from the model.

 

Let’s look at this reduced model. To do this, we select Show Prediction Expression from the red triangle for the model.

 

The parameter estimates are simply coefficients in a linear model, just like you’d get if you had fit a multiple linear regression model.  

 

This makes penalized models easy to interpret and explain.

Comments
gallardet

Thanks a thousand!
Your practical class has been very helpful.

Best wishes,

Manel