Re: Which is the better custom DOE design?

elitesky · Apr 22, 2020 01:44 PM

I am designing a custom DOE with 8 factors (5 are continuous, 3 are categorical - 2 level). I would like to include all 2nd order interactions along with 4 center points. I am trying to decide between two options for DOE design:

1.) Including the 2nd order interactions in the model (48 runs total)

Using this method, JMP suggests a default of 48 runs. The color map on correlations looks great, with my 1st and 2nd order terms at 1 and everything else close to 0. However, the efficiencies are low (D = 82.5, G = 44.1, and A = 72.5). Also, average variance of prediction is 0.41. The prediction variance profiles give me a maximum variance of 1.85, and my fraction of design space plot ranges from 0.3 to 1.75.

2.) Including the 2nd order interactions as alias terms (48 runs total)

Using this method, JMP suggests a default of 20 runs, but I manually increase the number to 48 for direct comparison against option 1. The color map on correlations looks notably worse, with a significant number of terms close to 0.5. Everything else about the design looks better, though. Efficiencies are much higher (D = 95.3, G = 95.2, and A = 95.2). The average variance of prediction is 0.12. The prediction variance profiles give me a maximum variance of 0.20, and my fraction of design space plot ranges from 0.08 to 0.20.

Please help me to pick between the two options and let me know which of these design evaluation metrics are most important. Thanks!

statman · Apr 22, 2020 02:19 PM

I don't have a good answer for you, but I do have some questions/comments for you to consider:

1. How are you creating center points when you have 3 categorical factors? Why do you suspect curvature inside your design space?

2. I believe you are missing a major consideration in design selection and that is how will you handle noise? It doesn't do much good to have high resolution when the results can't be replicated (akin to mapping the base of the mountain when your objective is to get to the top of the mountain). I suggest you consider both design factor resolution and "resolution" of noise effects.

3. I suggest you rank order the following effects to help develop pros/cons for each possible design being considered.

*Possible Effect*	*Possible Strategy*	*Rank*
Noise	RIBD or BIB's
Main Effects (1st Order)	Res. III
Two-Factor Interactions	Res. IV or Higher
Noise-by-factor interactions	RCBD
Simple curvature	Center points
Complex non-linear (≥cubic)	RSM
≥3rd order linear	Full Factorial
Stability & Leverage	Sampling
Measurement uncertainty	Nested MSE
Mean/variation	Repeats, Y's

4. Create a prioritized list (rank order) of model effects up to 2nd order. If that list has all 1st order effects followed by 2nd order use lower resolution. As the 2nd order effects rise to the top of the list, increase the resolution. With 8 main effects, it seems you are fairly far from the optimal design space. Likely you will need to move the space through iterations. So if you are moving the space, do the interactions help you?

5. Lastly, I suggest for each possible design, you predict every possible outcome and how would those outcomes affect your next iteration.

The best design you'll ever design is the one you design after you run it!

"All models are wrong, some are useful" G.E.P. Box

elitesky · Apr 22, 2020 02:49 PM

Hi @statman,

Thank you for your quick response. I will attempt to answer your questions below. A quick note though, I realized that I was incorrectly comparing the DOE design evaluation metrics. I now understand that JMP will provide design evaluation variance metrics based on the model terms and not the alias terms. Adding the 2nd order interactions to the evaluation of DOE option 2 significantly deteriorates the metrics to be worse than those of option 1.

1.) It is possible for a couple of our continuous factors to have curvature, although I am not sure. Perhaps this would be a good screening exercise ahead of the DOE.

2.) Good point. We have a budget for 30-40 runs, so I am trying to optimize within that range. Perhaps 2nd order interactions are not as critical if they are going to cost us on noisy data which could muddy the results of the main effects.

3.) Ranking each of these:

*Possible Effect*	*Possible Strategy*	*Rank*
Noise	RIBD or BIB's	2
Main Effects (1st Order)	Res. III	1
Two-Factor Interactions	Res. IV or Higher	3
Noise-by-factor interactions	RCBD	5
Simple curvature	Center points	6
Complex non-linear (≥cubic)	RSM	10
≥3rd order linear	Full Factorial	9
Stability & Leverage	Sampling	7
Measurement uncertainty	Nested MSE	8
Mean/variation	Repeats, Y's	4

4.) 1st order effects will likely trump all 2nd order effects in this case.

5.) I will try to predict outcomes and see how that would guide the subsequent iterations.

Thank you!

Mark_Bailey · Apr 22, 2020 03:20 PM

I am butting in here for one specific suggestion. Please continue your discussion with @statman .

Do not perform an initial experiment only to determine if you have curvature in the response. This will be wasteful with your fixed budget. You can address it in one experiment.

Please consider how many interactions effects are reasonable to expect. You have 8 factors. Do you expect 8 choose 2 effects? Probably not. Perhaps you expect up to 5 of them, but you cannot be sure which ones a priori. Then enter all of them in the Model section, but then change their Estimability to If Possible. Then add 3-4 runs per expected effect. so 5 times 3-4 means add 15-20 runs for interactions.

elitesky · Apr 22, 2020 03:56 PM

Thank you @Mark_Bailey, that is very useful. It seems to be a hybrid of the two options I initially mentioned.

statman · Apr 22, 2020 03:28 PM

Since you ranked 1st order then noise as the 2 most important effects, you will need to incorporate a strategy to learn about noise:

1. Incomplete blocks (RIBD or BIB): sacrifice a degree of freedom from the design factors to create the incomplete blocks

2. RCBD: Just 2 would be adequate

3. Repeat or treatments to get an estimate of the within treatment variation (due to short-term noise such as measurement error)

4. Split-plots

Regarding curvature, think of it this way. Do you think the departure from linear will be more significant than the slope of the linear effect? Based on your other input, I might wait to understand curvature until a later iteration or as Box suggests, estimate the non-linear through iterations vs. some central composite type design.

I would sacrifice resolution based one your input to incorporate noise. You could run 2^8-4 res IV in 2 incomplete blocks in 16 treatments.

"All models are wrong, some are useful" G.E.P. Box

elitesky · Apr 22, 2020 04:02 PM

Thank you @statman, this is very insightful. I will keep these all in consideration as we progress through the project.

Mark_Bailey · Apr 22, 2020 02:33 PM

I want to add a couple of things to the important points already raised by @statman.

First of all, the main difference between the two designs is the goal that you defined for each of them. The first design must provide the best 48 observations to estimate all main effects and interaction effects. The second design must provide the best 48 observations to estimate only the main effects, a much easier task. Also, the simpler model uses fewer degrees of freedom (38 less!), so there are more error degrees of freedom, which provide better prediction variance, et cetera.

I must assume that you used the default criterion for a D-optimal design. That choice is a good omnibus criterion but it is particularly good for screening designs or any case where the emphasis is on the parameter estimation or testing. Your request for center points leads me to believe that you are screening and want to check for non-linear effects, too. If your goal is optimization of factor levels, then estimating the response, or prediction, is more important. I don't think that the choice of the criterion will matter much the models you are considering.

P_Bartell · Apr 22, 2020 02:40 PM

In addition to the questions and topics raised by @statman and @Mark_Bailey , have you articulated the goal of the experiment? Do you have solid empirical prior knowledge of the influence of your selected factors? If not, then I'd suggest your goal is in line with screening for influential effects. As such then the second design is probably a more appropriate design, if for no other reason if the phenomena of effect sparsity and heredity hold, then you'll expend fewer resources to gain your insights. Then you might be able conduct additional experimentation with the new knowledge you've gained.

If your goal is not screening (ie., optimization or prediction) then some other design is probably more appropriate. Not sure I'd blanket recommend your first design. Since we don't know enough about what you know, don't know, and want to learn...let alone the practical problem at hand.

cwillden · Apr 22, 2020 02:50 PM

You have to realize those efficiencies are computed based on the required model terms, and the 2-factor interactions in Design 2 would not be included in that evaluation. The prediction variance is obviously going to be much lower for for the second design because you are using 48 runs to basically just model main effects. In the first design, it has to estimate main effects and second order interactions.

You are not getting apples-to-apples comparisons as is in terms of average prediction variance and optimality criteria efficiencies. You could do that by using Compare Designs and specifying the model that is used to compare both of them. It's probably worthwhile to look at the design comparison for both models (all main effects, and then all main effects + 2FI). You may not be able to do the second one if any interactions are completely confounded in Design 2. At the very least, it is a good idea to compare the designs for the main effects model because if Design 2 doesn't offer any major advantages over Design 1 in that scenario, then the choice is easy. Design 1 does better with many active interactions, and performs comparably if there are no active interactions.

I actually don't think those efficiencies are that low for each of those designs, nor is it metric I usually pay attention to. Those are relative to hypothetical designs with characteristics that are unachievable at the given design size. For example, you cannot get a completely orthogonal design for 2-level factors except where the run size is a power of 2, but that's the only way to get 100% D-efficiency. Let's say you can get 100% D-efficiency at 32 runs, but choose 48. At 48, you will not have 100% efficiency because you cannot get a completely orthogonal design, but you can still potentially have much higher power for all model terms compared to the 32 runs. If it doesn't cost me much to do the 16 additional runs, I'm obviously going to prefer that design even if it's not completely orthogonal. I think efficiencies are much more useful in comparing 2 designs in a ratio rather than as a standalone metric. In fact, I'd be very interested in efficiency ratios for Design 1 vs. Design 2 for the main effects model.

-- Cameron Willden

Which is the better custom DOE design?