Solved: Re: Assistance with DoE

Ressel · Apr 26, 2024 6:59 AM

I am trying to set up a DoE in my organization. This is the first time ever we are using this approach, there a couple of people watching and I want to get this right to underline the value of DoE and JMP. Therefore, I hope it is acceptable to post this question here. Example tables are attached.

This is the situation. We have:

4 continuous responses:
- Yield_A
- Yield_B
- Impurity_A
- Impurity_B
4 continuous factors:
- Reagent concentration
- Reaction time
- Reaction temperature
- Reagent-to-reactant ratio
2 covariates for the raw material to be used for testing, both continuous
- Concentration_compound_X
- Concentration_compund_Y
One 2-level nominal factor, which I presume can be interpreted both as blocking factor and covariate. It is suspected to influence the response, and it is directly tied to Concentration_compound_X and Concentration_compound_Y via a batch ID:
- ProcessType

We want to create a DoE for a model considering main effects, all 2-way interactions, the two covariates as well as the ProcessType. Note again, that the raw material used in this experiment can come from two different processes (Type _1 and Type_2). Each raw material batch has its unique set of Concentration_compound_X and Concentration_compound_Y levels that will be used as covariates in the design.

I am having issues understanding how to best add the covariates and ProcessType to the design. I am not able to add ProcessType as a blocking variable. After initial investigation we decided to go for a custom design. My understanding fails me after having loaded the four continuous responses and the four continuous factors into the DoE dialogue. This is were the saga begins.

Case 1 (Adding both covariates and ProcessType as “easy to change” covariates):

Factors and responses are already in the dialogue window. Then, I press the “Select Covariate Factors” button and add the covariates as shown below. The covariate table has 900 rows, and each row represents an individual batch.

I leave all factors at the default setting “Changes = Easy”:

After adding the interactions, the design ends up with 900 runs.

Clearly, this is a bogus design. No one will perform 900 runs, at least not voluntarily.

Case 2 (Adding both covariates and ProcessType as “hard to change” covariates)

In this example, the covariates and ProcessType were all added with role “Covariate” again, but factor change was set to hard. This creates a whole plot design with default 45 runs:

JMP allows the user to reduce the number of runs further, which is good. However, there are no center points in the design and I also can’t add replicates …

Thoughts & questions:

Is case #2 an acceptable design?
Between the two cases above, the sensible design obviously is case #2 but I am wondering why JMP doesn’t allow me to add center points and/or replicates.
Setting factor changes to hard for the two covariates and ProcessType appears realistic, as it is possible that manufacturing will be stuck with a single raw material batch. But is this choice correct? We can pick almost any batch we like for the experiment and the only reason why factor change was set to hard is to avoid generating a 900-run design.
I am a little bit surprised with the 900 runs generated in case #1. Am I misled in believing the design generating algorithm should pick a suitable number of batches and runs? Or is it my responsibility as user to tell JMP the number of runs I find suitable, evaluate the design and decide whether I need to improve on it by adding more batches from the covariate table?

Thank you for enlightenment!

statman · Apr 26, 2024 04:58 PM

Sorry, my advice suggests you should really get someone with experience to guide you through the design selection process. That is if you don't want to waste time and money. While experimentation is likely always better than OFAT, DOE does take planning. Remember the biggest issue with OFAT is the inference space. Holding factors constant while you manipulate one allows you to draw conclusions on that one that are contingent on where the other factors are set. If any of those change in the future, your conclusions may be invalid. This goes for noise as well.

I wouldn't get caught up with covariates. Just add a column for each covariate to whatever design you choose. Record a value of each covariate for each treatment. Analysis can be done subsequently to handle the covariate(s).

DO NOT TRY TO GET EVERYTHING IN ONE EXPERIMENT.

I would start the investigation with a RCBD (Randomized Complete Block Design). I would run a fractional factorial on the 4 design factors. Res IV is a good place to start. Set them at bold, but reasonable levels. For the block, confound the different concentrations for both X and Y. For the first block (-1), select one of the ProcessTypes and choose low levels of concentration for X and Y (this may require some sampling to get an estimate of concentration before the experiment is run). For the second block (1), use the other ProcessType and set levels of concentration to the high side. I wouldn't worry about curvature yet. Run the first block, analyze the data. What did you learn? Run the second block, compare the results of the first and second block. What did you learn? Were the effects found in the first block the same as the second block? Or do they change? If your results are similar between both blocks, you might say the 4 design factor effects are robust to incoming concentrations, if not you'll have to iterate to create a robust process.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

statman · Apr 26, 2024 12:14 PM

I don't understand the process or situation well enough in order to provide specific advice, but her are some things to think about:

1. Are you trying to pick a winner or are you interested in the causal structure affecting the 4 response variables?

2. Do you have an understanding of the measurement system variation/capability for each of the 4 responses?

3. If you are interested in understanding causal structure, and robustness to the incoming material concentrations, I would likely recommend sequential experimentation. Start with a large design space (lots of factors set at bold levels, large inference space).

4. What is your predicted rank order of model effects (including 1st, 2nd order linear and 2nd order non-linear)?

5. Specifically regarding covariates. If you got this route, you will be creating a mixed model (fixed and random effects model). I might suggest you start with the block (process type). You might try treating this block as both a fixed effect and a random effect. While you can still measure the concentrations and record those values for later analysis, I wouldn't start with covariates in the model. Adding covariates can greatly increase the number of DF's you "need". Have you thought of studying the concentrations of incoming materials with sampling? Knowing how much they vary is useful in determining how useful covariates can be. Remember you are putting 1 value for each treatment. What if that value is not representative of the concentration? What about additional measurement errors? What about lag effect?

6. Design multiple options, for example:

4 factors in half fraction res IV design run in two complete blocks, 16 treatments
4 factors run in 2 incomplete blocks, res III, 8 treatments
4 factors run in 2 incomplete blocks res V, 16 treatments
4 factors full factorial in 2 complete blocks, full res, 32 treatments
Add multiple center points to any of the above (a DF to test curvature in the design space and possibly to assess stability over the design space)
Add repeats to any of the above (within treatment noise, measurement errors)

For each possible design, determine what can be learned (what is the model, what is confounded, size of the inference space) and contrast that with resources required. Predict all possible outcomes and what you will do in each instance. Choose a design that meets your criteria and prepare to iterate.

"All models are wrong, some are useful" G.E.P. Box

Ressel · Apr 26, 2024 04:39 PM

@statman, as always very useful!

A few constraints:

We are pressed for time and lack the competence to compare the different design options with regards to their strengths and weaknesses.
As sad as it is, I am the person responsible for this effort. Alas, I went into this with the JMP mantra "DoE is always better than OFAT" in my head. Your comments make me worry and sap the confidence right out of my brain.

In response to your bullets:

Actually, we're trying to understand the individual effects and interactions of four continuous factors on the response. Optimization is a bonus but my understanding this that using DoE at least some optimization is likely to be feasible almost regardless of the design choice details once a major design category (custom vs response surface) has been chosen.
Yes.
That is an interesting thought. For now, though, we only want to understand the effect of the factors and possibly short-term optimize. We are convinced that the variability of the raw material has an influence on the response.
We are very confident that the factors proposed all have an effect. We are just not sure how big it is in each case. The ranking is not entirely clear and we wanted to go into this agnostically, which is why we initially considered a screening design. A JMP screening design, though, doesn't permit covariates or interactions in the DoE setup, which is why we moved on to the Swiss army knife solution "custom design".
Regarding covariates. We want them in the design because we have firm reason to assume that this will affect some of the responses.
1. I wouldn't be able to confidently treat experimental blocks as fixed and random effects (more reading is required. see "a few constraints")
2. Yes, we have thought about studying the concentrations of the incoming raw material. In fact, we know its approximate composition. If the values are not truly representative, this is also interesting information because it uncovers one source of variation that can add noise to our DoE data. This is one of the reasons why we want to set it up economically and responsibly, so we can blame lack of accurately knowing our raw material composition for noise later on. To achieve this we wanted to include the covariates.
Not that I don't want to do this, but besides the time constraint vs the in-depth reading that is recommended, selling this in and presenting it as part of the decision process towards the experimental design is challenging. I am not that good at acting. A infinitely easier sell would be to suggest a DoE with vs w/o covariates (many vs few runs).

Questions, and more thoughts (if I may):

Where possible, in direct reference to your bullets.

Following your comments, I am not convinced anymore that what I had planned per case #2 is any good at all. Now I am thinking "response surface" but:
1. this also appears to preclude covariates from being used in the design (at least in JMP).
2. with four continuous factors the default minimum (Box-Behnken) design has 27 runs (without considering the covariates as factors in their own right)

6. Now I am tempted to run a whole custom design for only one of the two raw material types (which I had, of course somewhat stupidly labeled as ProcessType, since their difference stems from differences in the manufacturing process).
This would allow inclusion of the covariates and estimation of all 2-way interactions in "only" 22 runs. But then again, the inclusion of covariates doesn't allow center points ...
Not considering the covariates, we'd be able to estimate all main effects and 2-way interactions for one raw material bx in even fewer runs, permitting perhaps one or two center points for accomodating curvature. This is comparably cheap and could help warming management up to the idea of further experimentation once they have developed a taste for the design profiler.
I'd probably be able to sell this in, arguing that we a.) only have a limited budget and b.) can later on try replicating these results with a few select raw material batches from the second ProcessType.

My biggest worry: morale. Should we give up and revert to OFAT!? Hopefully, this is the worst possible thing we could do. If not, I am doomed.

statman · Apr 26, 2024 04:58 PM

Sorry, my advice suggests you should really get someone with experience to guide you through the design selection process. That is if you don't want to waste time and money. While experimentation is likely always better than OFAT, DOE does take planning. Remember the biggest issue with OFAT is the inference space. Holding factors constant while you manipulate one allows you to draw conclusions on that one that are contingent on where the other factors are set. If any of those change in the future, your conclusions may be invalid. This goes for noise as well.

I wouldn't get caught up with covariates. Just add a column for each covariate to whatever design you choose. Record a value of each covariate for each treatment. Analysis can be done subsequently to handle the covariate(s).

DO NOT TRY TO GET EVERYTHING IN ONE EXPERIMENT.

I would start the investigation with a RCBD (Randomized Complete Block Design). I would run a fractional factorial on the 4 design factors. Res IV is a good place to start. Set them at bold, but reasonable levels. For the block, confound the different concentrations for both X and Y. For the first block (-1), select one of the ProcessTypes and choose low levels of concentration for X and Y (this may require some sampling to get an estimate of concentration before the experiment is run). For the second block (1), use the other ProcessType and set levels of concentration to the high side. I wouldn't worry about curvature yet. Run the first block, analyze the data. What did you learn? Run the second block, compare the results of the first and second block. What did you learn? Were the effects found in the first block the same as the second block? Or do they change? If your results are similar between both blocks, you might say the 4 design factor effects are robust to incoming concentrations, if not you'll have to iterate to create a robust process.

"All models are wrong, some are useful" G.E.P. Box

Ressel · Apr 26, 2024 05:14 PM

Thanks! I'll probably be able to successfully convey the main points of this message. And, I am relieved; no OFAT.

Ressel · Aug 7, 2024 2:51 AM

@statman, I wanted to thank you again for your advice. We have now finished our first ever DoE in our company, all the while observing closely your recommendations. While we did not get any additional input from statistics experts, it still went very well. So well, in fact, that I think management will be keen to see this experimental approach applied to different areas.

statman · Aug 6, 2024 05:54 PM

Great! Congrats on your first (of many) experiments. Hopefully the results are encouraging enough to continue. Remember this very important point: Every time you plan and run an experiment there are two significant opportunities:

1. What did you learn about the "process" that benefits the company AND

2. What did you, yourself learn about experimentation.

The second is long-term thinking. and may be more valuable.

"All models are wrong, some are useful" G.E.P. Box