Discussions

Statexplorer · May 6, 2025 03:18 AM

Hi,

I’ve conducted several Design of Experiments (DOE), but a recurring challenge is the large number of required trials. For instance, when using a screening design with 7 experimental runs, we typically test 4 to 6 replicates per run to evaluate reproducibility. This results in a total of 28 to 42 trials per DOE.

However, after running all these trials, the model performance often turns out to be unsatisfactory — with R² values around 0.55 to 0.59. Moreover, due to wide confidence intervals and inconsistent parameter estimates, we struggle to identify a reliable or repeatable model. In many cases, even when Y data is collected under controlled conditions, the model lacks consistency and fails to provide conclusive insights. So we need to collect a lot of process data from each experiment.

This is becoming increasingly frustrating. How can we effectively address this issue and reduce wasted effort while improving model reliability?

P_Bartell · May 6, 2025 07:02 AM

A few thoughts for you:

1. In screening mode try not to get too hung up on model performance...in screening mode your primary goal is identifying active effects/factors with less emphasis on a model and it's effectiveness regarding prediction. Ask yourself this question, "Have we identified factors that are worthy of additional investigation?" If yes, proceed accordingly. If not start to ask other questions...like,

a. Did we pick levels that aren't wide enough for the signal to rise above the noise?

b. Are there lurking noise variables that are influencing our responses...if so how will we handle them in the future...blocking?

c. Are there other factors we need to think about?

2. With all those replicates you should be able to get a nice estimate of pure error...which if big enough will start to overwhelm the signal. What do you know about your measurement system variation wrt to the responses? Maybe if big enough you need to work on reducing measurement system noise?

I'm sure others will chime in...and as always if you can share an experiment or too maybe we can help more?

View solution in original post

Victor_G · May 6, 2025 07:28 AM

Hi @Statexplorer,

@P_Bartell gave a lot of useful information and advices. I would like to complete some of the points mentioned, as you seem to do experiments in settings with a (relative) high noise.

Have you tried to analyze the possible cause of noise/high uncertainty ? Model unadequacy (missing interaction terms or quadratic effects ?), high measurement error, nuisance variables (day-to-day, operators, process variability ?). Assessing the source of noise would be helpful to recommend trying appropriate mitigation strategies :

In case of model unadequacy, what model do you often use in screening mode (main effects only) ? Do you have specific patterns in the residuals and/or lack-of-fit test that may indicates some problems ?
Are you expecting strong interaction effects and/or polynomial or non-linear effects ? Have you tried other (non-linear) modeling methods : for example, Machine Learning algorithms suitable for your dataset size and complexity, like Bootstrap Forest and SVMs ?
In case of high measurement error, can you expand the ranges of the factors (so that the signal can be relatively higher than noise) ? Can you also add repeated measurements : in addition of the replicates, or you can lower the number of independant replicate runs and compensate by repeated measurements, and model response mean and variance of the repeated measurements for each run ?
In case of known nuisance variables, perhaps you could add in your design generation some safeguards, like blocking, to avoid influence of operator, day-to-day variability, change of measurement equipment, etc... Using a random block variable enable to capture variance of this nuisance variable from the total experimental variance to prevent this variance to "hide" some potential active effects. If the nuisance variables can be handled/modified and are known (for example process variables), you could also use Taguchi designs to create a robust design or use the Custom Design platform : Experiments for Robust Process and Product Design

Hope this answer may contribute to the discussion,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

statman · May 6, 2025 6:04 AM

Unfortunately, I can't provide specific advice as there is not enough context. These issues can't be addressed globally. Here are my initial thoughts:

1. I'm not aware of any screening designs with 7 runs. Did you mean 8 (2^7-4 ResIII)?

2. I don't understand the number of replicates. First, these are replicates (independent running of the same treatment combinations) and not repeats? One of the things you "sacrifice" with experimental design is stability. If you are looking for understanding of stability issues, sampling and COV studies would be the tool of choice. You can't really get stability when running 4-6 replicates anyway.

3. It sounds like you have lots of noise. Perhaps you need a better strategy to handle the noise (e.g., RCBD, BIB, split-plots)

4. With basically zero knowledge of the situation, I would suggest starting with COV/sampling before DOE.

4. R^2 is only one of several statistics you should use to evaluate the model (and it is not the best one).

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

P_Bartell · May 6, 2025 07:02 AM

A few thoughts for you:

1. In screening mode try not to get too hung up on model performance...in screening mode your primary goal is identifying active effects/factors with less emphasis on a model and it's effectiveness regarding prediction. Ask yourself this question, "Have we identified factors that are worthy of additional investigation?" If yes, proceed accordingly. If not start to ask other questions...like,

a. Did we pick levels that aren't wide enough for the signal to rise above the noise?

b. Are there lurking noise variables that are influencing our responses...if so how will we handle them in the future...blocking?

c. Are there other factors we need to think about?

2. With all those replicates you should be able to get a nice estimate of pure error...which if big enough will start to overwhelm the signal. What do you know about your measurement system variation wrt to the responses? Maybe if big enough you need to work on reducing measurement system noise?

I'm sure others will chime in...and as always if you can share an experiment or too maybe we can help more?

Statexplorer · May 12, 2025 05:03 AM

Thanks @P_Bartell

But what should be the least minimum Rsq which we need to have as part of Screening designs.

a. Did we pick levels that aren't wide enough for the signal to rise above the noise? - how to find these?

b. Are there lurking noise variables that are influencing our responses...if so how will we handle them in the future...blocking?- how to find these?

c. Are there other factors we need to think about?- how to find these?

2. With all those replicates you should be able to get a nice estimate of pure error...which if big enough will start to overwhelm the signal. What do you know about your measurement system variation wrt to the responses? Maybe if big enough you need to work on reducing measurement system noise? - I know a the basics of MSA and its effects on responses? So how to use these on DOE level.

P_Bartell · May 12, 2025 06:45 AM

In my opinion there is such thing as a minimum R^2 value for any analysis of any design. R^2 is but one number that helps articulate the variability accounted for by a specific model. The goal of your analysis is not to get 'good statistics'...but satisfy your experimental goals and objectives. So if those goals and objectives are not answered/satisfied...I couldn't care less about R^2 and any other statistic. With regard to the other three questions you pose 'how do I find these'...those are addressed by domain knowledge, experimental conduct, past experience, and any first principles knowledge you can bring to the problem at hand. So let me start and try to help.

1. Wrt to levels, if you are in a position where you truly do not know where to set levels, then DOE is probably premature. I hate to recommend it...but if you have no idea, perhaps it's time for one factor at a time trials to see if you can find those levels.

2. Lurking factors...here is where domain knowledge plays a big role. For example, maybe you have to run the experiment in two different 'machines'...that you hope are the same...but want to guard against the situation where they aren't identical with respect to how they influence the responses. Here's where some form of blocking comes into play. Or factors are hard to change...like maybe temperature in a vessel so you'd like to restrict randomization on temperature running all the low values together, followed by the high values. In this situation what's called a 'split plot' design (just one form of blocking) is recommended. Think about the experiment and the conditions surrounding it and use your domain knowledge to guard against or at least account for these lurking factors. Also I always tried to be present whenever the experiment was being conducted to just 'watch' for anything that might be suspicious.

3. Way back when you were planning the experiment did you spend some time brainstorming ALL the factors that might be influential? Hopefully, 'yes', and you made a list. And kept that list. Maybe it's time after analysis to revisit that list and try some other factors?

4. As for measurement system variation, the rule of thumb we often used was measurement system variation was desired to be 1/4 the variation of the signal variation we were trying to find. Your situation might be completely different.

Hope this helps?

Victor_G · May 6, 2025 07:28 AM

Hi @Statexplorer,

@P_Bartell gave a lot of useful information and advices. I would like to complete some of the points mentioned, as you seem to do experiments in settings with a (relative) high noise.

Have you tried to analyze the possible cause of noise/high uncertainty ? Model unadequacy (missing interaction terms or quadratic effects ?), high measurement error, nuisance variables (day-to-day, operators, process variability ?). Assessing the source of noise would be helpful to recommend trying appropriate mitigation strategies :

In case of model unadequacy, what model do you often use in screening mode (main effects only) ? Do you have specific patterns in the residuals and/or lack-of-fit test that may indicates some problems ?
Are you expecting strong interaction effects and/or polynomial or non-linear effects ? Have you tried other (non-linear) modeling methods : for example, Machine Learning algorithms suitable for your dataset size and complexity, like Bootstrap Forest and SVMs ?
In case of high measurement error, can you expand the ranges of the factors (so that the signal can be relatively higher than noise) ? Can you also add repeated measurements : in addition of the replicates, or you can lower the number of independant replicate runs and compensate by repeated measurements, and model response mean and variance of the repeated measurements for each run ?
In case of known nuisance variables, perhaps you could add in your design generation some safeguards, like blocking, to avoid influence of operator, day-to-day variability, change of measurement equipment, etc... Using a random block variable enable to capture variance of this nuisance variable from the total experimental variance to prevent this variance to "hide" some potential active effects. If the nuisance variables can be handled/modified and are known (for example process variables), you could also use Taguchi designs to create a robust design or use the Custom Design platform : Experiments for Robust Process and Product Design

Hope this answer may contribute to the discussion,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Statexplorer · May 12, 2025 05:15 AM

Thanks @Victor_G

No Never tried assessing noise cause, bcoz so many process were involved before making a final cell.

I don't have any specific patterns in residuals.

As it is first DOE for this process (not for my experience) I don't expect any polynomial or interaction effects. We have not done any machine learning methods.

Blocking is one thing complete not done yet, but how it would add a advantage because in case all cells are done in same day, same instruments, by same person, using same materials.

statman · May 6, 2025 6:04 AM

Unfortunately, I can't provide specific advice as there is not enough context. These issues can't be addressed globally. Here are my initial thoughts:

1. I'm not aware of any screening designs with 7 runs. Did you mean 8 (2^7-4 ResIII)?

2. I don't understand the number of replicates. First, these are replicates (independent running of the same treatment combinations) and not repeats? One of the things you "sacrifice" with experimental design is stability. If you are looking for understanding of stability issues, sampling and COV studies would be the tool of choice. You can't really get stability when running 4-6 replicates anyway.

3. It sounds like you have lots of noise. Perhaps you need a better strategy to handle the noise (e.g., RCBD, BIB, split-plots)

4. With basically zero knowledge of the situation, I would suggest starting with COV/sampling before DOE.

4. R^2 is only one of several statistics you should use to evaluate the model (and it is not the best one).

"All models are wrong, some are useful" G.E.P. Box

Statexplorer · May 12, 2025 05:11 AM

Thanks @statman

Sorry It was not 7, it is 10 runs using custom design and seven only two continuous variables were changed.

3. It sounds like you have lots of noise. Perhaps you need a better strategy to handle the noise (e.g., RCBD, BIBD, split-plots) - How can I use these in JMP DOE.

4. With basically zero knowledge of the situation, I would suggest starting with COV/sampling before DOE. How to do COV study and choose right number sampling before DOE? Usually I keep 38-52 as sample size for whole DOE(that is cells performed for whole DOE)

4. R^2 is only one of several statistics you should use to evaluate the model (and it is not the best one). - So what are the other best one to evaluate the model, could u please tell me. I will always look for patterns, CI, Lack of fit, Parameter estimate, sometimes Normalized RMSE.

Discussions

Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Re: Effort for each DOE

Recommended Articles