I've thought a lot about the intercept terms in glm. Logistic regression in particular. In standard linear regression we can estimate the intercept term without any knowledge of the design matrix. It's just the mean of the responses. I figured that principle would stay the same in other models. The principle is
- Once you know the responses and the link function, then you can determine the intercept term. It should be the same as running the glm dialog without any predictors (NULL model).
This is not the case. The intercepts are very close to the NULL model. I also get different intercept values if I change the design matrix. Maybe it's due to the fact that the coefficients are generated by a search algorithm. I also did the same thing for Poisson glm and found the same thing...the intercept is close to the NULL model but not the same.
Interestingly, if you run LASSO and continue to increase the penalty, the intercept does converge to (shrink toward) the NULL model. This result is independent of the design matrix.
I suppose it would be cool if you could simply calculate the intercept term from the responses and the link function prior to estimating the covariate coefficients. Would the model be better?