Discussions

Liz_S · Jun 10, 2023 1:50 PM

On a couple different projects with a 0/1 outcome variable to predict, I have noticed that running the logistic regression (generalized, with binomial variance) checking the No Intercept box boosts the Generalized R Square substantially, about 25 or 30 percentage points. My latest model is for a rare event that is observed at about 1.2% in the experience period population. Predictive models with an intercept yield Generalized R Squares at about .60 to .67, while models without the intercept are at about .95 to .98. I do have some key variables that are highly predictive, so at first I thought the Gen R Squares above 90% seemed reasonable. But the confusion matrix shows more errors than I would like, even lowering the threshold to 2%-5%. I do like the idea that No Intercept implies a blind log-odds ratio for the constant term since Intercept=0, like flipping a coin, Probability =0.50. But perhaps these models are too easy to beat, inflating the Generalized R Square that depends on the likelihood ratios of (L0=intercept only model) to (LM fitted models with X predictors). Particularly if I know before hand (a priori) that the outcome event is rare. So, while it would be great to write a brief that has Generalized R Square about 95%-98%, I think it might be more prudent and practical if I use a model with an intercept that comes in at Gen R Square at 60%. Please respond back if you have any advise for me. Thanks!

Liz_S · May 18, 2022 01:38 PM

Hello, this morning I received a detailed response from JMP Support with numerous suggestions that will help me be more efficient in JMP when modeling, as well as to try some techniques I have never used yet. The response (from Patrick Giuliano) referenced this article and advice: "The importance of "many approaches" leads to a common and defendable solution. From Lavine, M., Frequentist, Bayes, or Other? (Summarized in Editorial THE AMERICAN STATISTICIAN, 2019, VOL. 73, NO. 51, 1-19): 1. Look for and present results from many models that fit the data well. 2. Evaluate models, not just procedures."

Essentially, I learned that the very high Generalized R Squares (~98%) for the no-intercept models probably indicate a lack of stability; that it was too strong of an assumption to force the linear models through the origin. Perhaps I also should revisit some of the modeling issues created by the multicollinearity in the predictors. It was a helpful reply! I appreciate being able to reach out to JMP Support with my de-identified data and my scripts. Thanks much!

View solution in original post

Mark_Bailey · May 12, 2022 11:30 AM

I compared the models with and without an intercept in a few examples and always observed the opposite trend: the R square metrics were better when the model included an intercept term. If you can reproduce the results that you reported then I suggest contacting JMP Technical Support (support@jmp.com) to get resolution. Please reply to this discussion to capture their findings for the benefit of the Community.

Liz_S · May 16, 2022 10:59 AM

Yes, I am sending in my example JMP file to JMP support today. The model without an intercept has Generalized RSquare at 98% and the models with an intercept has Generalized RSquare at 55%. I'll keep the community posted.

Liz_S · May 18, 2022 01:38 PM

Hello, this morning I received a detailed response from JMP Support with numerous suggestions that will help me be more efficient in JMP when modeling, as well as to try some techniques I have never used yet. The response (from Patrick Giuliano) referenced this article and advice: "The importance of "many approaches" leads to a common and defendable solution. From Lavine, M., Frequentist, Bayes, or Other? (Summarized in Editorial THE AMERICAN STATISTICIAN, 2019, VOL. 73, NO. 51, 1-19): 1. Look for and present results from many models that fit the data well. 2. Evaluate models, not just procedures."

Essentially, I learned that the very high Generalized R Squares (~98%) for the no-intercept models probably indicate a lack of stability; that it was too strong of an assumption to force the linear models through the origin. Perhaps I also should revisit some of the modeling issues created by the multicollinearity in the predictors. It was a helpful reply! I appreciate being able to reach out to JMP Support with my de-identified data and my scripts. Thanks much!

Mark_Bailey · May 19, 2022 08:40 AM

I'm glad that you got a helpful answer. Best of luck in all your modeling!

Discussions

When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Re: When running Logistic Regressions with no intercepts, has anyone observed very high General RSquares, about 25-30 points higher than models with intercepts?

Recommended Articles