First, welcome to the community. An interesting first post...The typical enumerative statistics used to evaluate models are ALL contingent on the model.
Two observations an experimenter should keep in mind:
1. Statistical significance is a conditional statement. Sources of variation (design factors) are compared to other sources of variation (the noise changing treatment-to-treatment) under a certain set of conditions (the noise held constant during the DOE, aka inference space). If sources or conditions change so may statistical significance.
2. The extrapolation of experimental results is an engineering and managerial decision, not a statistical one. It is largely influenced by the how representative the study is of future conditions.
If you change the terms in the model, then those statistics can and likely will change. When you remove terms from the model, they are pooled into the error term. This may inflate or deflate the estimate of the random errors as quantified by the MSE. For example, if you remove insignificant terms from the model, not only does the variance associated with those terms (sums of squares), but the corresponding degrees of freedom pool to the error term. This will reduce the MSE and increase the F-ratio (decrease the p-value). In essence, you control statistical significance as you are the one planning the experiment...what will be in the model and how representative the experiment is of future conditions.
BTW, be cautious of R-square. It will always increase as you add degrees of freedom to the model (whether the degrees of freedom you add are significant or not). Better to evaluate the delta between R-square and R-square adjusted when refining your model.
"All models are wrong, some are useful" G.E.P. Box