Re: Lack of Fit in Logistic Rgression Report

ranjan_mitre_or · Jul 5, 2018 04:05 PM

I have performed Logistic Regression analysis on a data set that contains 6 binary factors and 1 continuous factor. Then I repeat the analysis after converting the continuous parameter to binary by thresholding (0 if <= threshold, 1 otherwise).

With the first analysis I get the following Lack of Fit table in the report

Lack Of Fit

Source	DF	-LogLikelihood	ChiSquare
Lack Of Fit	1.85e+7	4433.7627	8867.525
Saturated	1.85e+7	387.7085	Prob>ChiSq
Fitted	7	4821.4712	1.0000

When I repeat the analysis after converting the continuous variable to binary I get the following Lack of Fit table in which Prob>ChiSq is now 0.0009 in place of the earlier value of 1.0000.

Lack Of Fit

Source	DF	-LogLikelihood	ChiSquare
Lack Of Fit	120	87.1476	174.2951
Saturated	127	4778.6278	Prob>ChiSq
Fitted	7	4865.7753	0.0009*

I cannot find the description of Lack of Fit in the documentation for Nominal Logistic Fit Report. In the context of "Lack of Fit", do I want the value to be close to 1 for the model to be fitting well to the data? What does the asterisk next to 0.0009 mean?

Mark_Bailey · Jul 6, 2018 09:09 AM

I do not recommend replacing a continuous predictor with a binary predictor. Binary variables are less informative.

The difference you observe is due to the change in the degrees of freedom. The first analysis includes 1.85E+7 degrees of freedom in the test of the sample statistic of 8867.525 while the second analysis includes only 120 DF for the corresponding sample statistic of 174.2951.

The lack of fit test is based on a comparison between the selected model and the saturated model (unbiased). The null hypothesis assumes that they are the same. The expected value of chi square under the null hypothesis is equal to the DF. Chi square exceeds the DF under the null hypothesis. The associated p-value informs how many such results exceed the sample statistic from the analysis.