Discussions

djw238 · Jun 8, 2023 1:59 PM

I have a general question about the use of center points vs. historical process target values in DoE studies outlined in the scenario below:

When deciding on the values to use for the high and low levels of factors in a DoE study, it was determined that the high/low levels that span the range of interest are not exactly symmetric around the current historical target values for some factors. When adding multiple center points to such a design, would it be acceptable to use the current target values for some factors rather than the exact symmetric center points?

My understanding is that the purpose of including multiple center points is to get a measure of the variability and also determine whether there is any curvature present. Is there any harm or disadvantage to using slightly offset "center points" for some factors to achieve these objectives? The preference for using the historical target values rather than the exact center points stems from the desire to compare to historical data. Any thoughts or advice?

Mark_Bailey · Aug 15, 2020 07:45 AM

You can do what ever you like. You won't break anything.

You are collecting data to fit an empirical model either to test hypotheses about effects or to predict the response under new conditions. This case is a regression problem. The (sole) purpose of a designed experiment is to provide the optimal data for the regression. The information in the data comes out through the model. You can use an ad hoc method or a principled method to design your data collection. The design methods based on statistical principles are meant to provide optimal data collection under a given set of specifications.

The common symmetric distribution of factor levels is a result of either a combinatorial design method (e.g., factorial designs) or an optimal design method when the linear regression model is first or second order. The two or three levels, respectively, are the optimal levels. But if you design an experiment for a third order linear combination, the resulting four optimal levels are not symmetric. Who knew? Well, they are symmetric about the center, but they are not evenly spaced over the range.

So adapting the design will reduce its optimality but probably not by all that much. It certainly won't break. The center points in your case are meant to also serve as control runs. You cannot specify that to the method, so make the optimal design and then move the center points to where you want them. The new center points will still provide the information to estimate the random errors independent of the model and, in turn, test for lack of fit (assuming you are fitting a first order model).

Another caution: be sure to make the factor ranges wide. Be bold. You might think that a wide range will produce some undesirable responses and, so, should be avoided. That is a good result! The purpose of the experiment is not to find the desired conditions. The purpose is to provide the data to fit a model that will find the desired conditions. So the game is to use the factor ranges to provoke large effects (changes in the response) that are easy to detect (high power) with low standard errors (high precision, stable).

View solution in original post

Mark_Bailey · Aug 15, 2020 07:45 AM

You can do what ever you like. You won't break anything.

You are collecting data to fit an empirical model either to test hypotheses about effects or to predict the response under new conditions. This case is a regression problem. The (sole) purpose of a designed experiment is to provide the optimal data for the regression. The information in the data comes out through the model. You can use an ad hoc method or a principled method to design your data collection. The design methods based on statistical principles are meant to provide optimal data collection under a given set of specifications.

The common symmetric distribution of factor levels is a result of either a combinatorial design method (e.g., factorial designs) or an optimal design method when the linear regression model is first or second order. The two or three levels, respectively, are the optimal levels. But if you design an experiment for a third order linear combination, the resulting four optimal levels are not symmetric. Who knew? Well, they are symmetric about the center, but they are not evenly spaced over the range.

So adapting the design will reduce its optimality but probably not by all that much. It certainly won't break. The center points in your case are meant to also serve as control runs. You cannot specify that to the method, so make the optimal design and then move the center points to where you want them. The new center points will still provide the information to estimate the random errors independent of the model and, in turn, test for lack of fit (assuming you are fitting a first order model).

Another caution: be sure to make the factor ranges wide. Be bold. You might think that a wide range will produce some undesirable responses and, so, should be avoided. That is a good result! The purpose of the experiment is not to find the desired conditions. The purpose is to provide the data to fit a model that will find the desired conditions. So the game is to use the factor ranges to provoke large effects (changes in the response) that are easy to detect (high power) with low standard errors (high precision, stable).

statman · Aug 15, 2020 10:34 AM

Some further thoughts to add to Mark's excellent ideas...

The reason for wanting to compare to historical target values may come from the following thought. Since the levels you are testing at are not the current settings, If the experiment suggests better result if you move a factor in the high (or low) direction, a common question: is it better than current? You have no such comparison in the experiment. Hence when doing initial screening designs, it is recommended to always include the current level setting as one of the levels.

Center point additions to the experiment space provide a very efficient way of testing curvature, providing an estimate of error that is not biased by treatment effects and if you run enough of them randomly throughout the experiment, a method for examining the stability of the experiment space. Of course, to get center points all of the factors should be quantitative/continuous.

Why can't you set levels equidistant from the historical targets? You are not trying to pick a winner, but create variation that can easily be analyzed.

"All models are wrong, some are useful" G.E.P. Box

djw238 · Aug 15, 2020 01:18 PM

Thank you both for your very thorough answers, much appreciated!

I think you both raised good points related making sure that the factor ranges are wide enough or just making the factor levels wider to make them equidistant from the target. The concern for making some of the factor levels too wide is, as you might have guessed, that some factor combinations could potentially break the process so much that a quantitative value for the responses could not be obtained. Such experiments within the design would then just become pass/fail. I think this is a rather common concern that gets raised when setting ranges for DoE studies.

If such a situation does occur during a DoE what would you recommend as the best course of action? Repeat just the failed experiments with narrower ranges for the factors? Enter an extreme value as the response or somehow otherwise indicate in the model that such an experiment failed? Start the DoE over with narrower factor ranges? I believe I've heard all of these possibilities suggested before when trying to deal with failed experiments within a DoE study.

Any further thoughts? Thanks again for the advice!

statman · Aug 15, 2020 01:47 PM

Good questions. Here are my thoughts:

1. The typical failure mode for level setting is TOO NARROW. One hypothesis is because there is a fear of making bad product (which we have been taught to not do). Remember, you are trying to create variation not pick the winner. Bolder level settings increase the inference space with little resource ramifications. Bold, but reasonable is my advice.

2. The other observation I have running experiments and teaching experimentation for 35 years, there is, perhaps subconscious, a desire to have the "best" level included in the experiment. This is not necessary for sequential, iterative work. I believe this is the effect of management wanting an answer quick rather than understanding the problem and causal structure. Say the word iteration to management and they think time and money...

3. Advice I give is to PREDICT ALL possible outcomes of the experiment (i.e., What will you do if there is no practical change in the response variables? What will you do if there is a significant change in the response variables, but none of the factors are significant? What will you do if there is special cause variation during the experiment? What will you do if factor A is significant, What if it is not?, etc.). In addition, I suggest predicting the value for the response variable for each treatment. One of the benefits of this prediction is you can think about run order. Let's say you are uncertain about level setting. You run the treatments predicted to have the best value and the treatment predicted to have the worst value as the first 2 runs of the experiment. If there is no variation created, you re-think the level setting of the factors. If there is a significant amount of variation created, then proceed with the experiment.

4. I once asked Dr. Taguchi this very question many years ago. He said first that the treatment combination that produced no result may be the most informative treatment in the experiment. He also spent a good deal of time trying to understand what actually happened when the process failed, and created a response variable that quantified that phenomena. In fact, I think Taguchi's thinking about response variables is one of his most significant contributions to the field.

5. Replicating just that treatment combination leads to the issue of confounding block effect with the replicated treatment, so be cautious with this approach.

6. If you lose only one treatment, you can regress on the remaining data to estimate what that result may have been and use that value to salvage the rest of the experiment. (Or use the mean of the remaining data as the substitute value which has the effect of removing that treatment effect). If you use a substitution concept, my advice is to do multiple substitutions and see to what extent they agree. Ifg the agreement is sufficient, then go with it. If not, then you have to think about additional runs.

"All models are wrong, some are useful" G.E.P. Box

djw238 · Aug 15, 2020 05:27 PM

Thank you for the very detailed response and suggestions! I found it very useful and I hope others will as well.

Discussions

DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Re: DOE center points vs. historical process targets

Recommended Articles