Discussions

Amir_H · Jun 8, 2023 5:32 PM

Hi All, I appreciate your time and response. I have built a model using main effects, interactions, and sometimes 2nd orders. I have used Standard Least Squares and 60% of the data. About 20% is used for validation and 20% for test sets.

R2, adj R2, and those for validation and test sets are all about 98-99%. VIFs are all between 1 to 2.

The only issue is I have a significant lack of fit.

What can I do here? Can I ignore the LoF?

PS: I noticed when I find a model from GenReg (pruned or Lasso, etc.) and I use its model terms and try them in Standard Least Square, I get almost the same model and with a significant Lack of Fit. Does that mean it's not taken into account in GenReg?

P_Bartell · Apr 13, 2021 12:16 PM

Significant LOF can come about from a variety of sources. A few questions:

1. Do your residual plots give some clues as to the source of LOF? Do the plots show any non random structure or suspicious patterns?

2. What is the purpose of the model? Explanatory or prediction? If it's explanatory, LOF is probably not as problematic. If it's predictive, how much does the LOF condition impact the utility of the model to make predictions across the factor space compared to the magnitude of the 'wrongness' of the predictions. If the magnitude of the wrongness of the predictions isn't practically impactful, I wouldn't be as worried about living with a model that has significant LOF.

Amir_H · Apr 13, 2021 12:38 PM

Thanks for the quick reply @P_Bartell. Please see the attached picture for residuals. I don't think they look problematic?!

The model is going to be used for both optimizations, currently, and in the near future for predictions. The wrongness of prediction is not going to be too impactful.

P_Bartell · Apr 13, 2021 04:09 PM

How do the residuals vs. predictor variable plots appear? This can be a hint to missing terms or effects in the model. And the residual vs. run order (if you ran a designed experiment) appear? This can indicate a non stable variance or some other lurking variables effect that may have entered the experimental execution event which could also lead to LOF.

From the looks of your Predicted vs. Residual plot methinks the primary cause of the LOF is a relatively small variance for most of the replicate points. This makes a really small Pure Error term vs. the LOF error term in the LOF ANOVA and makes a 'significant LOF' p-value more likely.

As you state the magnitude of the difference between the actual and predicted values is not practically significant so if the model can still be used for it's intended purpose...oh well, you've LOF. Who cares?

Amir_H · Apr 13, 2021 04:38 PM

Thanks again for the kind answer @P_Bartell , I agree with the relatively small variance for most of the replicate points.

To answer your questions, I have included another screenshot.

Discussions

Significant Lack of Fit

Re: Significant Lack of Fit

Re: Significant Lack of Fit

Re: Significant Lack of Fit

Re: Significant Lack of Fit

Recommended Articles