BookmarkSubscribeRSS Feed
gene

Community Trekker

Joined:

Jun 23, 2011

Lasso singularity details

I am applying lasso to a design matrix that has a couple of singularity issues.  There are a couple of columns that are linear combinations of other columns.  I have run lasso in two cases.  The first I just let lasso reprt the singularities.  The second case is where I remove the two columns that generate the singularities.  The models are a bit different.  The other thing that occurs is that when I launch lasso with the full column rank matrix it automatically does logistic regression and I have to do an extra step to make it do lasso.  Is there a general protocol to remove singularities if they are evident?

1 ACCEPTED SOLUTION

Accepted Solutions
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: Lasso singularity details

Why would you enter a variable that is known to be a linear combination of other variables used in the model? Would you expect the models to be the same?

 

What is the modeling type of your response variable? If it is categorical, then JMP is trying to help and it selects logistic regression. But you can over-ride that behavior, as you discovered.

Learn it once, use it forever!
2 REPLIES
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: Lasso singularity details

Why would you enter a variable that is known to be a linear combination of other variables used in the model? Would you expect the models to be the same?

 

What is the modeling type of your response variable? If it is categorical, then JMP is trying to help and it selects logistic regression. But you can over-ride that behavior, as you discovered.

Learn it once, use it forever!
Highlighted
gene

Community Trekker

Joined:

Jun 23, 2011

Re: Lasso singularity details

Thanks Mark,

 

The design matrix was supplied as part of a project as a set of black box features from which we are supposed to see if we can use some sort of machine learning to predict a binary response.  The columns of X are a data dump of all data collected and are used in a very complex manner to generate the respose and the columns have a very complex correlation structure. In some cases the columns are either redundent or have perfect colinearity with other columns.  So to your point, I would never add them in intentionally.  However I imagine folks are often handed wide design matrices and simply let lasso take care of colinearity.  I prefer to investigate and remove obvious redundancy with the added benefit that I can at least include vanilla logistic regression as one of the machine learning choices.  BTW, I now have JMP Pro and it's great!