I am applying lasso to a design matrix that has a couple of singularity issues. There are a couple of columns that are linear combinations of other columns. I have run lasso in two cases. The first I just let lasso reprt the singularities. The second case is where I remove the two columns that generate the singularities. The models are a bit different. The other thing that occurs is that when I launch lasso with the full column rank matrix it automatically does logistic regression and I have to do an extra step to make it do lasso. Is there a general protocol to remove singularities if they are evident?
The design matrix was supplied as part of a project as a set of black box features from which we are supposed to see if we can use some sort of machine learning to predict a binary response. The columns of X are a data dump of all data collected and are used in a very complex manner to generate the respose and the columns have a very complex correlation structure. In some cases the columns are either redundent or have perfect colinearity with other columns. So to your point, I would never add them in intentionally. However I imagine folks are often handed wide design matrices and simply let lasso take care of colinearity. I prefer to investigate and remove obvious redundancy with the added benefit that I can at least include vanilla logistic regression as one of the machine learning choices. BTW, I now have JMP Pro and it's great!