Solved: How does removing terms in a full factorial model influence summary of fit?

PearsonLamb615 · Jun 8, 2023 2:13 PM

Hi all,

Can you help me with this question:

"How does removing terms in a full factorial model model influence the model performance? (Rs square value, confidence interval etc. will increase or decrease for example? )"

Best,

Céline

statman · Dec 20, 2022 09:58 AM

First, welcome to the community. An interesting first post...The typical enumerative statistics used to evaluate models are ALL contingent on the model.

Two observations an experimenter should keep in mind:

1. Statistical significance is a conditional statement. Sources of variation (design factors) are compared to other sources of variation (the noise changing treatment-to-treatment) under a certain set of conditions (the noise held constant during the DOE, aka inference space). If sources or conditions change so may statistical significance.

2. The extrapolation of experimental results is an engineering and managerial decision, not a statistical one. It is largely influenced by the how representative the study is of future conditions.

If you change the terms in the model, then those statistics can and likely will change. When you remove terms from the model, they are pooled into the error term. This may inflate or deflate the estimate of the random errors as quantified by the MSE. For example, if you remove insignificant terms from the model, not only does the variance associated with those terms (sums of squares), but the corresponding degrees of freedom pool to the error term. This will reduce the MSE and increase the F-ratio (decrease the p-value). In essence, you control statistical significance as you are the one planning the experiment...what will be in the model and how representative the experiment is of future conditions.

BTW, be cautious of R-square. It will always increase as you add degrees of freedom to the model (whether the degrees of freedom you add are significant or not). Better to evaluate the delta between R-square and R-square adjusted when refining your model.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

statman · Dec 20, 2022 09:58 AM

First, welcome to the community. An interesting first post...The typical enumerative statistics used to evaluate models are ALL contingent on the model.

Two observations an experimenter should keep in mind:

1. Statistical significance is a conditional statement. Sources of variation (design factors) are compared to other sources of variation (the noise changing treatment-to-treatment) under a certain set of conditions (the noise held constant during the DOE, aka inference space). If sources or conditions change so may statistical significance.

2. The extrapolation of experimental results is an engineering and managerial decision, not a statistical one. It is largely influenced by the how representative the study is of future conditions.

If you change the terms in the model, then those statistics can and likely will change. When you remove terms from the model, they are pooled into the error term. This may inflate or deflate the estimate of the random errors as quantified by the MSE. For example, if you remove insignificant terms from the model, not only does the variance associated with those terms (sums of squares), but the corresponding degrees of freedom pool to the error term. This will reduce the MSE and increase the F-ratio (decrease the p-value). In essence, you control statistical significance as you are the one planning the experiment...what will be in the model and how representative the experiment is of future conditions.

BTW, be cautious of R-square. It will always increase as you add degrees of freedom to the model (whether the degrees of freedom you add are significant or not). Better to evaluate the delta between R-square and R-square adjusted when refining your model.

"All models are wrong, some are useful" G.E.P. Box

Jed_Campbell · Dec 20, 2022 7:03 AM

If you want more in-depth understanding than what @statman gave, the excellent and free online Statistical Thinking for Industrial Problem Solving course (STIPS) has a really approachable explanation of this, specifically in the Correlation and Regression module.

How does removing terms in a full factorial model influence summary of fit?

Re: How does removing terms in a full factorial model influence summary of fit?

Re: How does removing terms in a full factorial model influence summary of fit?

Re: How does removing terms in a full factorial model influence summary of fit?