Re: Full Factorial Design questions

CYLiaw · May 4, 2020 02:20 PM

Hi!

I have a full factorial design with four factors. Three of the factors have three levels and the other one has five levels. So I have 3x3x3x5 total runs with no replicates. I have a few questions as follows:

1. After running the model, it shows significant Lack of Fit. However, there are no replicated data points. How can JMP perform the lack of Fit test then?

2. Since it shows lack of fit, based on my understanding, it implies that more terms should be included in the model. Can I add the quadratic terms in the model? Or the full factorial design is only used for studying the main effects and the interaction effects?

3. I tried adding the quadratic terms in the model, however, the lack of Fit was grayed out after running the model. Why is that? I don't think my model is saturated since there are more data points than the number of predictors.

4. Can a full factorial design with three levels (-1 0 +1) be considered a response surface model because it also has factors at the mid-level? Why can't a full factorial design with factors that have three levels capture the curvature of the response surface?

Thank you!

Mark_Bailey · May 4, 2020 02:49 PM

Is your model such that you only have first-order terms? If so, the you have a kind of replication and meet all the other requirements for the lack of fit test.

Yes, you can add quadratic terms. Your data might support the more complex model, but there is no guarantee. I think it will work in your case, though.

You no longer meet the requirements for the lack of fit test: replicates and number of distinct levels relative to the order of the model.

The term 'response surface design' generally refers to classic designs such as the Box-Wilson and Box-Behnken designs, but any design used for optimizing factor levels could be called a RSM design.

The three-level full factorial design can be used to estimate the second-order model.

CYLiaw · May 4, 2020 03:58 PM

Hi Mark,

Thanks for your reply. Yes, my model only has the first-order and the interaction terms. However, I still don't quite understand what you mean by 'you have a kind of replication and meet all the other requirements for the lack of fit test.' What is a kind of replication? I didn't have any replicated measurement under the same conditions. And what are the other requirements for the lack of fit test? You mentioned 'number of distinct levels relative to the order of the model'. Can you please explain it in more detail as well? Thank you very much!

Mark_Bailey · May 4, 2020 1:43 PM

Let's say that I design a full factorial experiment for 5 factors, each at 3 levels. Then later in the analysis, I eliminate all of the terms that include 1 of the factors. Whenever this kind of model reduction occurs, the design space (e.g., 5-D) projects into a smaller number of dimensions (e.g., 4-D), and the original design is now a replicated design for the remaining 4 factors. Even a full factorial design for 2 factors, each at 2 levels, exhibits some replication. For example, each factor is tested at each level twice. So as @statman pointed out, it depends on how you use your degrees of freedom. They are pooled, for example, for the error sum of squares. They provide a model-dependent estimate of the RMSE, but hey, not a bad deal.

The lack of fit test requires (1) replicates and (2) the factor exhibit 2 more levels than the order of the model. So if you have a first-order model (order 1), you need 3 distinct levels If you have a quadratic model( order 2), then you need 4 distinct levels. But you do not need to replicate the entire design.

CYLiaw · May 4, 2020 05:50 PM

Thank you Mark for the clarification. It makes for sense now. I have one more question. So can I still use the model which includes the quadratic terms but doesn't allow for Lack of Fit test?

Mark_Bailey · May 4, 2020 08:05 PM

The lack of fit test would be used to see if the quadratic model is insufficient for the response over the design space. Do you think that is either a real possibility or if there are higher order effects that they are more than a few percent?

Whether you use the LOF test, you should confirm the selected model with future empirical observations. That is, use the model to predict the response under new conditions (not previously observed) and test those conditions. I recommend predicting conditions that give you what you want and what you do not want. A good model should predict reality, good or bad.

CYLiaw · May 4, 2020 09:33 PM

If I intend to get an explanatory model (i.e. identifying the most important factors and interactions) instead of a predictive model, is it still necessary to predict the response under new conditions? I saw people would do cross validation, is it different from what you suggested? Also, what do you mean by 'predicting conditions that give you what you want and what you do not want'

Thanks!

statman · May 4, 2020 10:06 PM

Wow, this is getting interesting.... So some thoughts:

First I'm a bit confused by the line of work. You started with 3 factors at 3-levels and 1 factor at 5-levels and in a factorial. All continuous variables. This would seem to me that you already are in optimum space? If you were screening, this would not be a first experiment. There are way more efficient ways to get there. In running the experiment you ran, it would seem reasonable that you already understand first order effects and are trying to map the surface. To be honest, I don't understand this mix of factors and levels, but , of course, I do not know the situation. Typically when folks are talking about doing validation and cross validation of the model, it is when they create a model using some sort of regression on an existing data set. The model you have gotten from your factorial should be better evidence of causal relationships (not just a model that explains the data). So you should have a model from your experiment (simplified to the significant factors). Now go test it using new data (as Mark suggests). This is scientific method.. Or even better see where your model fails (under what conditions does the model fall apart). Since you spent so much effort on understanding the design factors, you should already know the effects of noise, if not, you should spend some of your resources understanding this. It doesn't do any good having a detailed map of the base of the mountain when you are trying to get to the top.

"All models are wrong, some are useful" G.E.P. Box

Mark_Bailey · May 5, 2020 05:58 AM

The decisions that you make based on the explanatory model use statistics that still assume that the model is correct. For example, the F ratios in the Effect Tests or the t ratios in the Parameter Estimates from JMP assume that the error sum of squares are only the random deviations in the response. You can only use the current data set to estimate the model parameters or select the terms in the model. You need independent evidence to decide if the model is correct. Again, cross-validation is for model selection, not model validation.

I meant that I would find conditions for which the selected model predicts a good response (e.g., high yield) as well as conditions that are expected to have a bad response (e.g., low yield). Reality ranges from bad to good. A realistic model should re-produce all of this reality. Then I am reasonably confident in my decisions (or predictions) based on the model.

statman · May 4, 2020 03:07 PM

Mark, as usual has given the responses you asked about. I just have some questions/feedback to add:

First question is; Are the factors all continuous?

1. Yes, you left terms out of the model, hence the lack of fit dialogue. You could write a saturated model for the 44 degrees of freedom, but that would include cubic and quartic terms (and their interactions) for the 5-level factor and other non-linear interaction terms.

2. Full factorial are full resolution designs...therefore every degree of freedom can be estimated. Whether every term makes sense or not is a different question.

3. I'm not sure, but my guess is the non-linear interaction terms are not considered to be included in the model. I notice that when you use JMP to construct model effects and use the Response Surface Macro, the non-linear interaction terms are pooled in the MSE.

4. Doesn't matter what you call it, you can certainly estimate the quadratic effects for a factorial with factors at 3-levels.

"All models are wrong, some are useful" G.E.P. Box