Solved: Re: Should I consider power analysis in DOE?

lazzybug · Jun 8, 2023 2:09 PM

Somebody told me we need to keep power >=80% when evaluating a DOE, but why the power is so low when using PB design? For example, the power is around 0.27 if I run 12 experiments with 11 factors. Can I trust this design with such low power? Thank you.

Phil_Kay · Jun 7, 2022 06:25 AM

I agree with everything that @Victor_G has said. I just want to repeat that the power is only meaningful when:

You know the size of the signal that you need to detect AND...
You have good estimates of the experimental and response measurement noise (RMSE)

(You also need to decide on a significance level)

It is very common to be missing one or both of these things. In those cases, power can still be useful but only as a way to compare alternative designs.

There is not much point in worrying about whether power is less than 80% if you are just using the default signal and noise estimates in JMP.

I have attached some slides that I created to explain how power works in JMP DOE.

I hope that helps,

Phil

View solution in original post

Victor_G · Jun 7, 2022 1:29 AM

Hi @lazzybug,

Power is the ability to detect significant effect if they are effectively present.

This 80% is a pratical rule of thumb to evaluate if your screening design may be able to detect main effects efficiently, but let's keep in mind that power calculation depends not only on the design (and its sample size), but also on the significance level (often 0,05), difference size to be detected and expected signal-to-noise ratio (or RMSE) of the response(s).
So if your anticipated RMSE is lower (response(s) is/are measured more precisely and with lower variance) and/or the difference size you want to detect is higher, you might expect higher power for your effects.

Power might not be very interesting to consider in case of optimization design (mixture designs, space-filling designs...), because their primary focus will be on response(s) prediction precision, not estimates calculation precision. But power is of primary interest in case of screening design, like in your case.

Difficult to tell if we could trust your design with these info, since it depends not only on the design itself and its expectations (RMSE, significance level, difference size to detect, ...), but also on which factors you want to screen, which ones you expect to see significant, expected difference to detect... = domain expertise.

Maybe one solution for you would be to try computing different designs, in order to compare the strengths and weaknesses of the different screening designs.
Looking at the Custom Design, if I fix 12 experiments for 11 continuous factors (minimum number of runs required), I find also power for main effects close to 0,21 (so 21% of chances to see a significant effect if present). But if augmenting the number of runs to 16 (recommended by JMP, "Default" setting), power for main effects rise up to 0,84. If experimental cost is not too much, increasing the sample size may help you a lot to gain more confidence in your screening ability.

Hope this will help you,

Victor GUILLER
L'Oréal Data & Analytics

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Phil_Kay · Jun 7, 2022 06:25 AM

I agree with everything that @Victor_G has said. I just want to repeat that the power is only meaningful when:

You know the size of the signal that you need to detect AND...
You have good estimates of the experimental and response measurement noise (RMSE)

(You also need to decide on a significance level)

It is very common to be missing one or both of these things. In those cases, power can still be useful but only as a way to compare alternative designs.

There is not much point in worrying about whether power is less than 80% if you are just using the default signal and noise estimates in JMP.

I have attached some slides that I created to explain how power works in JMP DOE.

I hope that helps,

Phil

lazzybug · Jun 10, 2022 12:01 PM

@Victor_G and @Phil_Kay, thank you so much for your explanation. It's very useful for me to understand the power and RMSE. I have one further question. For my experiment, I just want to detect which factors in many factors are significant from for my response by screening design, after that I will use response surface to optimize those selected factors. Before experiment, I don't know what's the RMSE, and what coefficient to setup in the power analysis. Can you explain more how I should design my experiment? Thank you so much again.

I just noticed that this is a great place to ask questions. I hope I knew this community earlier.

Victor_G · Jun 11, 2022 6:53 AM

Hi @lazzybug,

Since your goal will be about detecting which factors are significant from a large list of factors, you'll definetely use a screening design.

From what I read, there are two main designs that may work very well in your case, depending on the type and number of factors :
- D-optimal custom design (through the platform "Custom Design", specify your factors and range, and JMP will find the optimal design for your experimental space). A very convenient design platform, enabling a lot of flexibility for the type of factors, with the possibility to restrain the randomization (when you have hard-to-change factors), and possibility to add covariates, blocking factors ...
- Definitive Screening design when you want to screen a lot of factors (useful for 5+ factors) in a very efficient way, but you have restrictions with this type of design : only continuous or 2-levels categorical factors, and no possibility to add a blocking factor (for example if you are only able to measure 10 experiments per equipment run, you may add a blocking factor of 10 measurements per run to account for variability between different equipment runs). But if you have a low number of factors that are significant, you may be able to screen through interactions and quadratic effects with this design, which makes it very efficient. Important to notice, the main effects are uncorrelated (orthogonal) to each others, and also to 2-factors interactions and quadratic effects.

You can also try a more classical approach (like Plackett-Burmann screening designs), but these classical designs may be hard to fit to a specific problem sometimes.

At the end, the choice of the design will be the best compromise you can find between your experimental setup (number and type of factors, constraints, maximal number of experiments, ...) and design specifications/characteristics. I would highly recommend to try different designs and runs number, and use the "Compare designs" platform (DoE -> Design Diagnostics -> Compare Designs) to have a good overview about each design strenghts and weaknesses, to decide which design and settings would make the best compromise for your specific subject.

No matter the design you choose, once your screening done and significant factors/main effects identified, you can then augment your design in order to create a completely custom response surface for the optimization of these factors.

Some ressources about Definitive screening designs :
Definitive Screening Design: Simple Definition, When to Use - Statistics How To
Definitive Screening Design - JMP User Community
Powerful Analysis of Definitive Screening Designs: Taking Advantage of Their Spe... - JMP User Commu...

Hope this will help you,

Victor GUILLER
L'Oréal Data & Analytics

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

lazzybug · Jun 14, 2022 06:26 PM

Hi Victor, thank you so much for your material. I will introduce DSD into our group for optimization work in the future.

Could you please explain my questions about DoE?

1. I knew Type I and Type II error in statistical, how we apply them into DoE. For example, type I error (alpha): H0 is true, the chance we reject it. type II error (beta): Ha is true, the chance we accept H0. For a particular DoE, what's the H0 and Ha? Let's say we have 6 factors, should we have 6 null hypothesis, and each one will be like H0: Level (-1) = Level (+1), Ha: Level (-1)!=Level (+1)?

2. Do you have one example to explain how I can manually calculate the power for main effect, interaction, and quadratic based on alpha = 0.5, RMSE =1 as shown in JMP DoE? I want to calculate power manually with Excel.

Thank you so much for your help again.

Victor_G · Jun 15, 2022 5:36 AM

Hi @lazzybug,

I will answer the best I can your questions :

1. In DoE, you realize hypothesis testing in a "macro" view (the model itself) and in a "detailed" view (significance of each parameter). This can be seen in "Fit Least Squares" report from responses modeling in DoE, where a p-value is given for the general model (under the "Actual vs. Predicted" visualization), giving you an idea about how significant is your model vs. the mean of the values, and the p-values calculated for each factor in the "Effect summary table".

Under the "macro" view (entire model), null hypothesis would be : changes in response are not correlated to changes in factors (or factors have estimates close to 0). Alternative hypothesis would be that changes in response are correlated to at least one factor (or at least one factor has an estimate significantly different from 0).
Under the "detailed view" (for each factor), null hypothesis would be : this factor has no impact on the final outcome (Level (-1) = Level (+1)). Alternative hypothesis : this factor has an impact on the final outcome (Level (-1) != Level (+1)).

Interesting ressource for this topic : The Basics of Experimental Design for Multivariate Analysis - JMP User Community

2. Not sure where and/or how to find it, but you may find answers here :

Power Calculations (jmp.com)

Prospective Sample Size and Power (jmp.com)

You can also check the different calculators and simulations in the Sample Data Index : Help -> Sample Data -> Teaching resources, or in the Statistics Index and then search for "power".

@Phil_Kay and other JMP technical experts will be able to complete or correct my answers
I hope this will help you,

Victor GUILLER
L'Oréal Data & Analytics

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Mark_Bailey · Jun 15, 2022 08:46 AM

Adding to @Victor_G's explanation, the 'macro' view is specifically answered with the F-ratio in the Analysis of Variance table. The null hypothesis is that only the intercept parameter is not zero. The alternative hypothesis is that not only the intercept is not zero. The 'micro' view is either the term level test provided by Effect Tests or the parameter level provided by the Parameter Estimates. The effect tests address the whole term where as the parameter estimate tests address the individual parameters. They are the same unless you have a categorical factor with more than two levels. The effect tests is perhaps more useful in model selection because typically you remove a term, not a parameter. The effect test is based on the F-ratio and the parameter estimate is based on the t-ratio. The t-ratio is (estimate - hypothesized estimate) / standard error of the estimate. They hypothesized value is zero (no effect) for these tests.

Phil_Kay · Jun 15, 2022 09:06 AM

@Victor_G 's answers are really good.

On question 2, I would add that you can't calculate power based only on alpha = 0.5 and RMSE = 1. As I said in an earlier reply, you also need an estimate of the signal size that you need to detect.

I would also add that calculating the power in Excel is not going to be easy. I would not recommend trying this.

If you really want to understand power, I would recommend running some simulations instead.

1. Create your experimental design

2. Determine a model to calculate your response for each run with parameters for each "active" factor effect (In JMP these are the "anticipated coefficients")

3. Add random noise to your each response value (e.g. with standard deviation of 1, mean of 0)

4. Analyse the simulated data to determine if the active effects are significant at your determined level of alpha (0.05 is most often used, 0.5 would be a strange choice!)

5. Repeat steps 3 and 4 many times with new random noise.

6. Count the number of times that each active effect is declared significant

You should find that the proportion of times that the active effects are declared significant matches the power calculated by JMP.

This kind of simulation is very easy to do in JMP Pro. In fact, it is used in situations where a priori power estimation is not possible. For example, where the response is binary. I once used this to help a marketing company determine the power of an experiment to understand factors affecting response ( respond / don't respond) to their promotions!

It would be more effort and would require some scripting to do this kind of simulation in JMP. But I still think that would be a better use of time than trying to calculate power in Excel.