Solved: Very High R square and significant lack of fit problem

Ella · Oct 9, 2020 01:56 AM

Hi,

I made custom design for 7 factors with a model containing main effects, interractions and 2 order powers. Number of sample size is 150. I see that 3 of factors are important and made model for these 3 variables.

However i faced with significant lack of fit problem and 0.99 R^2 value.

I searched a lot about the reasons of this situation but i could not find a clear answer.

Do you have an idea about it, what causes this problem?

Kind regards.

SaraA · Apr 13, 2024 03:56 PM

Hi @P_Bartell

If it's prediction across the factor space, how impactful is the magnitude of the prediction error across the experimental space from a PRACTICAL point of view?

Could you explain what you meant here exactly? Did you mean that lack of fit, as a measure of the prediction error of the model, could be of little significance if the purpose of the DOE is to see the effect of the factors on the response across a certain experimental space, contrary to performing a DOE to optimize a reponse?

View solution in original post

statman · Apr 15, 2024 10:14 AM

Just to re-iterate, R-square by itself must be interpreted carefully. R-square will always increase as you add DF's to the model. But the point of creating a model is to include only the useful terms (active and significant). So you should look at the delta between the R-square and the R-square adjusted. As the delta increases, it is an indicator your model is over specified.

Looking at your residuals, there is an indication of a non-linear effect. Your model should consider perhaps adding a quadratic effect as suggested by @modelFit

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

ThuongLe · Oct 9, 2020 02:33 AM

Is it possible to share your data and script?

Thuong Le

Dan_Obermiller · Oct 9, 2020 4:00 AM

A lack of fit test compares the variance from the "model not fitting" to the variance of the replicated points (pure error). A significant result means that the "model not fitting" variance is larger than the pure error. The most likely cause is that the model form you are using is not appropriate for the data.

Consider this picture:

A high RSquare value, yet a significant lack of fit because a straight line does not describe the relationship. Remember that a high RSquare by itself does not mean you have a good model. A high RSquare means you have explained much of the variance in the response, but you still may not have a good model. You must consider other items such as model significance, residual analysis, etc. to determine if you have a good model.

So I would recommend plotting your model residuals to see if there are any patterns. A pattern would indicate what might be wrong with your current model form. Also check to ensure that the residuals follow a normal distribution as that could also lead to a lack of fit situation.

Finally, it is possible to get a significant lack of fit test if your replicated points have an extremely low variance rather than the model form being incorrect. This does not happen that often, but I have seen that occur. In such situations, you should look into determining why or how you achieved such a low variance. That can provide some good insights.

Dan Obermiller

Ella · Oct 9, 2020 08:52 AM

Thank you very much Dan,

I exactly facing all problems you have stated .

normality problem in error terms, i think have some trend in my error term (if you have diffrent opinion please share with me) and very very small pure error variation.

If I redesign experiment and include cubic term of factor, do i solve this problem?

(By the way, i know my dependent variable depends on other factors that are not included in model but i set them constant when i make DOE and run experiments. I guess this does not cause this problem, right?)

P_Bartell · Oct 9, 2020 10:52 AM

I concur with all that @Dan_Obermiller contributed. From your residual plot it looks like the root cause of the lack of fit is a relatively small replicate response variance compared to the total response variance.

Here's another thought for you to consider...what is the purpose of experiment/study? If it's prediction across the factor space, how impactful is the magnitude of the prediction error across the experimental space from a PRACTICAL point of view? If it makes little practical difference, then, quite frankly, I couldn't care less that you have LOF.

Ella · Oct 9, 2020 12:06 PM

Exactly reason of LOF is very small variance of replicates as also seen in pure error value.

Aim of this study is to model dependent variable and make optimization to find optimum Factor levels which return dep. Variable is in between 0.35 and 0.36.
If you look at my current model error values, they are changing between -0.3 and 0.3 which is very big for my target. That’s why, i am trying to find better model explaining my dependent variable to make a better forcast.

P_Bartell · Oct 9, 2020 01:28 PM

@Ella If your target value is somewhere between 0.35 and 0.36, your predicted variation is not -0.3 to 0.3 but closer to +/- 0.1. And I suspect the largest contributor to the residual variation in this space is due to factor changes...not LOF. Take a look at the residual plot in the neighborhood of 0.35 and 0.36 to see my point. Here's a follow up thought for you:

I suggest taking your analysis all the way through to the JMP Prediction Profiler and then using JMP's Desirability, and Desirability optimization capabilities to find a factor space optimum centered on 0.355 with upper and lower JMP Desirability Function bounds at 0.35 and 0.36. Then if you have some process knowledge of how much to expect the factors to vary at those factor recommendation/optimums, run a Monte Carlo simulation within the Prediction Profiler to see what the variation in your response is. If your simulated factor variation in the simulation is relatively large, then LOF will be the least of your issues.

Ella · Oct 10, 2020 02:27 AM

Thanks a lot MR. Bartell, i see you point and it is very logical. buy i can not exactly understand your suggestion about Monte Carlo simulation. Could you explain it in detail?

P_Bartell · Oct 12, 2020 3:17 AM

@Ella The detailed steps to execute the workflow I suggest in my latest reply to this thread are far too many and detailed and lengthy, and some depend highly on domain expertise more than a statistical theory or technique to articulate in a JMP User Community Discussion Forum thread. I have two suggestions for you from this point forward:

1. Since it sounds like you have little to no previous experience using JMP, DOE and modeling for process optimization problems find someone in your organization, or hire a consultant, who has done this before and has some experience. If for nothing else they'll be able to guide you through the platform workflows, offer suggestions along the way, and ask thought provoking questions that will help you continue to frame your problem and approaches to a solution.

2. If you are truly all alone and have no other technical resources available for a detailed workflow for what I'm suggesting, I recommend looking at this JMP Blog and video written by my former (I'm retired) colleague @robert_anderson . The blog post shows the basic workflow and offers some guidance on navigating the JMP ecosystem to operationalize the workflow I'm suggesting. Here's a link to the blog post:

SaraA · Apr 13, 2024 03:56 PM

Hi @P_Bartell

If it's prediction across the factor space, how impactful is the magnitude of the prediction error across the experimental space from a PRACTICAL point of view?

Could you explain what you meant here exactly? Did you mean that lack of fit, as a measure of the prediction error of the model, could be of little significance if the purpose of the DOE is to see the effect of the factors on the response across a certain experimental space, contrary to performing a DOE to optimize a reponse?

Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem

Re: Very High R square and significant lack of fit problem