cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Liz_S
Level II

When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

On a couple different projects with a 0/1 outcome variable to predict, I have noticed that running the logistic regression (generalized, with binomial variance) checking the No Intercept box boosts the Generalized R Square substantially, about 25 or 30 percentage points.  My latest model is for a rare event that is observed at about 1.2% in the experience period population.  Predictive models with an intercept yield Generalized R Squares at about .60 to .67, while models without the intercept are at about .95 to .98.  I do have some key variables that are highly predictive, so at first I thought the Gen R Squares above 90% seemed reasonable.  But the confusion matrix shows more errors than I would like, even lowering the threshold to 2%-5%.  I do like the idea that No Intercept implies a blind log-odds ratio for the constant term since Intercept=0, like flipping a coin, Probability =0.50. But perhaps these models are too easy to beat, inflating the Generalized R Square that depends on the likelihood ratios of (L0=intercept only model) to (LM fitted models with X predictors).  Particularly if I know before hand (a priori) that the outcome event is rare.  So, while it would be great to write a brief that has Generalized R Square about 95%-98%, I think it might be more prudent and practical if I use a model with an intercept that comes in at Gen R Square at 60%.  Please respond back if you have any advise for me.  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Liz_S
Level II

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Hello, this morning I received a detailed response from JMP Support with numerous suggestions that will help me be more efficient in JMP when modeling, as well as to try some techniques I have never used yet.  The response (from Patrick Giuliano) referenced this article and advice: "The importance of "many approaches" leads to a common and defendable solution. From Lavine, M., Frequentist, Bayes, or Other? (Summarized in Editorial THE AMERICAN STATISTICIAN, 2019, VOL. 73, NO. 51, 1-19): 1. Look for and present results from many models that fit the data well. 2. Evaluate models, not just procedures."

Essentially, I learned that the very high Generalized R Squares (~98%) for the no-intercept models probably indicate a lack of stability; that it was too strong of an assumption to force the linear models through the origin.  Perhaps I also should revisit some of the modeling issues created by the multicollinearity in the predictors.  It was a helpful reply!  I appreciate being able to reach out to JMP Support with my de-identified data and my scripts.  Thanks much!

 

View solution in original post

4 REPLIES 4

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

I compared the models with and without an intercept in a few examples and always observed the opposite trend: the R square metrics were better when the model included an intercept term. If you can reproduce the results that you reported then I suggest contacting JMP Technical Support (support@jmp.com) to get resolution. Please reply to this discussion to capture their findings for the benefit of the Community.

Liz_S
Level II

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Yes, I am sending in my example JMP file to JMP support today.  The model without an intercept has Generalized RSquare at 98% and the models with an intercept has Generalized RSquare at 55%.  I'll keep the community posted.

Liz_S
Level II

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Hello, this morning I received a detailed response from JMP Support with numerous suggestions that will help me be more efficient in JMP when modeling, as well as to try some techniques I have never used yet.  The response (from Patrick Giuliano) referenced this article and advice: "The importance of "many approaches" leads to a common and defendable solution. From Lavine, M., Frequentist, Bayes, or Other? (Summarized in Editorial THE AMERICAN STATISTICIAN, 2019, VOL. 73, NO. 51, 1-19): 1. Look for and present results from many models that fit the data well. 2. Evaluate models, not just procedures."

Essentially, I learned that the very high Generalized R Squares (~98%) for the no-intercept models probably indicate a lack of stability; that it was too strong of an assumption to force the linear models through the origin.  Perhaps I also should revisit some of the modeling issues created by the multicollinearity in the predictors.  It was a helpful reply!  I appreciate being able to reach out to JMP Support with my de-identified data and my scripts.  Thanks much!

 

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

I'm glad that you got a helpful answer. Best of luck in all your modeling!