BookmarkSubscribeSubscribe to RSS Feed

Re: Lack of Fit in Logistic Rgression Report

ranjan_mitre_or

Community Trekker

Joined:

Oct 16, 2013

I have performed Logistic Regression analysis on a data set that contains 6 binary factors and 1 continuous factor. Then I repeat the analysis after converting the continuous parameter to binary by thresholding (0 if <= threshold, 1 otherwise).

With the first analysis I get the following Lack of Fit table in the report

Lack Of Fit

Source

DF

-LogLikelihood

ChiSquare

Lack Of Fit

1.85e+7

4433.7627

8867.525

Saturated

1.85e+7

387.7085

Prob>ChiSq

Fitted

7

4821.4712

1.0000

 

When I repeat the analysis after converting the continuous variable to binary I get the following Lack of Fit table in which Prob>ChiSq is now 0.0009 in place of the earlier value of 1.0000.

Lack Of Fit

Source

DF

-LogLikelihood

ChiSquare

Lack Of Fit

120

87.1476

174.2951

Saturated

127

4778.6278

Prob>ChiSq

Fitted

7

4865.7753

0.0009*

 

I cannot find the description of Lack of Fit in the documentation for Nominal Logistic Fit Report. In the context of "Lack of Fit", do I want the value to be close to 1 for the model to be fitting well to the data? What does the asterisk next to 0.0009 mean?

1 REPLY
Highlighted
markbailey

Staff

Joined:

Jun 23, 2011

I do not recommend replacing a continuous predictor with a binary predictor. Binary variables are less informative.

 

The difference you observe is due to the change in the degrees of freedom. The first analysis includes 1.85E+7 degrees of freedom in the test of the sample statistic of 8867.525 while the second analysis includes only 120 DF for the corresponding sample statistic of 174.2951.

 

The lack of fit test is based on a comparison between the selected model and the saturated model (unbiased). The null hypothesis assumes that they are the same. The expected value of chi square under the null hypothesis is equal to the DF. Chi square exceeds the DF under the null hypothesis. The associated p-value informs how many such results exceed the sample statistic from the analysis.

Learn it once, use it forever!