cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ranjan_mitre_or
Level III

Lack of Fit in Logistic Rgression Report

I have performed Logistic Regression analysis on a data set that contains 6 binary factors and 1 continuous factor. Then I repeat the analysis after converting the continuous parameter to binary by thresholding (0 if <= threshold, 1 otherwise).

With the first analysis I get the following Lack of Fit table in the report

Lack Of Fit

Source

DF

-LogLikelihood

ChiSquare

Lack Of Fit

1.85e+7

4433.7627

8867.525

Saturated

1.85e+7

387.7085

Prob>ChiSq

Fitted

7

4821.4712

1.0000

 

When I repeat the analysis after converting the continuous variable to binary I get the following Lack of Fit table in which Prob>ChiSq is now 0.0009 in place of the earlier value of 1.0000.

Lack Of Fit

Source

DF

-LogLikelihood

ChiSquare

Lack Of Fit

120

87.1476

174.2951

Saturated

127

4778.6278

Prob>ChiSq

Fitted

7

4865.7753

0.0009*

 

I cannot find the description of Lack of Fit in the documentation for Nominal Logistic Fit Report. In the context of "Lack of Fit", do I want the value to be close to 1 for the model to be fitting well to the data? What does the asterisk next to 0.0009 mean?

1 REPLY 1

Re: Lack of Fit in Logistic Rgression Report

I do not recommend replacing a continuous predictor with a binary predictor. Binary variables are less informative.

 

The difference you observe is due to the change in the degrees of freedom. The first analysis includes 1.85E+7 degrees of freedom in the test of the sample statistic of 8867.525 while the second analysis includes only 120 DF for the corresponding sample statistic of 174.2951.

 

The lack of fit test is based on a comparison between the selected model and the saturated model (unbiased). The null hypothesis assumes that they are the same. The expected value of chi square under the null hypothesis is equal to the DF. Chi square exceeds the DF under the null hypothesis. The associated p-value informs how many such results exceed the sample statistic from the analysis.