DoE Model issues Augmentation

Report Inappropriate Content · Oct 5, 2023 5:37 AM

Dear JMP community,

currently we are setting up DoE models based on two experimental blocks.

The first block was a Custom DoE Design with 22 runs containing main effects and quadratic effects from 7 factors. The second block was an augmented design containing main effects, quadratic effects and all possible two factor interactions with 16 runs (for a total of 38 runs separated in 2 blocks).

In the second block, for two of the factors the factor levels were wider than in the first block.

Additionally, 7 two factor interactions that were rated less likely to occur were set to “if possible” in the model dialog. All other factors were kept at “necessary”.

Usually, we set up the models with the Fit Model Platform considering only the model effects coming from the design by clicking “run” (personality: standard least squares, emphasis: effect screening) and then manually removing the non-significant model effects from the models.

We encountered the following situtation: in some cases we get an error message saying “The model is missing an effect”.

Stepwise evaluation (personality: stepwise) via forward direction with either AICc or BIC both are yielding models with significant effects.

Our questions to you would be

In which form is the “if possible” rating considered for the stepwise models? If we see an significant effect that was previously rated as “if possible”, that means the power of the design was good enough to elucidate it. But if we do not see an “if possible” effect, where from do we know that it was considered?
What is the reason that the manual way is reporting issues whereas the stepwise ways are working?
The BIC criterium is yielding much larger models than the AICc criterium, what would be the reason to trust the much smaller AICc models?

Many thanks in advance to all of you, best regards,

C. and A.

Victor_G · Oct 5, 2023 8:42 AM

Hi @I-love-jmp,

Welcome in the Community !

The error message you have seen may be encountered when you break the principle of heredity, for example introducing a quadratic effect or 2-factors interactions but without introducing the related main effects. DoE are based on three principles : effect sparsity, effect heredity and effect hierarchy, more infos and details on this answer : DoE Principles - JMP Community

Usually, making sure all main effects are present should help avoiding this message.

To be sure this message is linked to this problem, it would be better to have more context, perhaps a datatable that reproduces this problem (with anonymized data) ?

The "if possible" setting is used for design creation, and may impact the analysis. It basically tells JMP (and the coordinate exchange algorithm behind) a priority/ranking about the terms you want to investigate, the relative precision in estimates it has to calculate, and how it may allocate ressources (design points) in order to maximize the learning you want (with necessary effect and if possible effects).

Sometimes points may be allocated to estimate these effects, sometimes it is not possible, due to restrictions in the experimental space and/or experimental budget (number of runs) asked by the user. "If possible" effect terms don't increase the number of runs required, so you end up by default with the same number of runs as if you didn't enter these effects in the Model panel. That means in the analysis, you may or may not be able to estimate these effects depending on the allocation of points and aliases in the design, no matter your method of analysis.

More technical details here : Designs with If Possible Effects (jmp.com)

Stepwise modeling doesn't not assume by default effect heredity and effect hierarchy, it's an "agnostic" method that try to find the best model based on a criterion (AICc, BIC, p-values, ...). So you may end up with different models than with a Standard Least Squares approach (that does respect all DoE principles). I would clearly try to respect these principles in a DoE created dataset, and not remove a main effect (even if statistically non-significant) if higer order effects containing this factor are still present.

So Stepwise is working, even if you break effect heredity or hierarchy principle, as it is just evaluating a lot of possible models and using the terms that best improve the model.

As you're dealing with a dataset with a specific structure and data generation, I would not recommend using data mining/"agnostic" modeling methods like Stepwise, except if you want to compare your previous modeling with new insights from other approaches.

It's a strange situation, as generally BIC would penalize models with more terms than AICc due to the differences in their calculation : https://www.jmp.com/support/help/en/17.1/index.shtml#page/jmp/likelihood-aicc-and-bic.shtml

It's however not uncommon, as I was able to reproduce your situation very easily :

At the end, it's "only" an Information criterion that guides you in the modeling, what is interesting is to compare on which terms they agree, and on which terms they differ, evaluate the different models possible with other metrics (statistical significance of the model, R² and R² adjusted, RMSE of the model ...) depending on your goal, and make your decision on all these informations AND your domain expertise !

I hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Oct 5, 2023 10:00 AM

A quick clarification, the error is a result of not following the principle of hierarchy (not heredity).

"All models are wrong, some are useful" G.E.P. Box

Mark_Bailey · Oct 6, 2023 11:55 AM

This reply is not an argument but simply a point of clarification. Various people use the terms 'hierarchy' and 'heredity' interchangeably. Here is the way JMP uses these terms:

• Hierarchy of effects – First-order effects represent the most variation and second-order effects
account for less variation.
• Heredity of effects – Higher-order terms usually involve factors with significant main effects.

Victor_G · Oct 6, 2023 01:46 PM

Thanks @Mark_Bailey !

I had the same definitions in mind and in the response I mentioned in my first reply.

This is why I thought the error message was a direct consequence that the heredity principle was broken (because it affects the model structure before the analysis, which is not in accordance with the data generation process/design structure) and not a consequence that the hierarchy principle was broken (as it affects more the analysis and variation repartition once the model is launched).

Can you confirm and/or explain ?

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Oct 6, 2023 03:14 PM

The error is due to violating the principle of hierarchy. The error occurs when there are higher order terms specified in the analysis, but the lower order terms are missing. You cannot get an error for heredity. Heredity is used to assist in understanding confounded higher order terms. For example, let's say we have run a Res IV fractional factorial and during analysis we find significant effects. The effects are a function of aliased terms (e.g., a main effect is aliased with a 3rd order interaction, 2-factor interactions are aliased with other 2-factor interactions).

X4= X1*X2*X3 (design generator); I=X1*X2*X3*X4

X1*X2=X3*X4

We can use the hierarchy principle as justification to conclude the main effect is likely the influential effect vs. a 3rd order effect. Hierarchy does not help in assisting for the effect that contains multiple 2nd order effects (they are the same hierarchy). To assist in understanding which of the 2 factor interactions is most likely contributing to the significant effect (remember they are aliased, so there is no math to help), we can apply the heredity principle. We look to see if any of the main effects are significant and if so we have evidence that the 2-factor interaction that includes the significant main effect is the likely contributor. Example:

X1 is significant (X1=X2*X3*X4)

X1*X2 is significant (X1*X2=X3*X4) we suspect it is the X1*X2 as X1 is also active.

"All models are wrong, some are useful" G.E.P. Box

statman · Oct 5, 2023 10:19 AM

Unfortunately I do not understand your specific situation, but I am confused by your approach. Blocking is a technique used to handle NOISE associated with the design space. Noise is the factors your are not willing to manage or control in the process. Instead of holding noise constant and negatively impacting inference space, noise is confounded with the block. This increases the inference space while not negatively impacting the precision of the design. You can run complete blocks (RCBD) which are complete replicates of the design, or incomplete blocks (BIB) which fractionate the block.

There are, of course, multiple methods for designing and analyzing experiments. No one approach is best in all situations. There are a number of factors that affect design selection (e.g., noise strategy, constraints, # of factors, desired resolution (both linear and non-linear), desired effects to be estimated, restrictions on randomization, et. al) and these should be considered prior to selecting a design. When selecting a design, you should consider what effects can be estimated, which will be confounded and which will be restricted. This is the potential knowledge to be gained. This knowledge should be balanced with the resources required.

My bias is to analyze experiments with a subtractive model building approach. That is start with a saturated model and remove insignificant terms. This is because I have a model in mind when the experiment is designed. When using historical/observational data, I start with an additive approach (e.g., stepwise) as I am looking for clues to develop hypotheses that ultimately will be tested in an experiment.

Lastly, always start with a practical assessment of the data. Did the response variables change enough (practical significance) in the study to warrant further investigation? Next, graphical analysis (multiple plots of the data looking for patterns in the response variable and matching patterns in the x's. Lastly quantitative analysis and there are a number of statistics used here as suggested by Victor.

"All models are wrong, some are useful" G.E.P. Box

Victor_G · Oct 5, 2023 8:25 AM

Hi @statman,

Thanks for the clarification and correction.

Concerning the blocking, as @I-love-jmp mentioned he augmented the design, I think that means he has augmented the design with the option "Group new runs into separate block", so adding a blocking factor to the design with a block for the original design and a block for the augmented runs. This may be a good and "safe" practice, practicularly when set of experiments are separated in time, with possible calibration errors and/or shifts/drifts in the measurement systems.

Entirely agree with you about the modeling mindsets between DoE data (and an a-priori model) vs. observational data (and a data mining/model-agnostic approach).

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

I-love-jmp · Oct 9, 2023 07:45 AM

Dear Victor_G, statman, and Mark_bailey,

first of all, thank you very much for the advices, explanations and discussions you provided about our questions. We really appreciate your input!

Sorry for the late response to all of you. Concerning the blocking factor Victor_G explained that very well. The factor was introduced to account for the variance that might (or is likely to be) introduced by the separation of the blocks in time (and there are other, most likely significant, factors that add to the blocking besides time).

Concerning the error message during model generation: Considering your explanations about heredity and hierarchy we think you nailed the point here and we were able to circumvent the error message.

To give you more context for the DoE and the underlying performed experiments, here is the Augment design

Please note that for two factor interactions containing factor “a”, the estimability in the model was set to “if possible” since these interactions were rated unlikely to be significant based on subject matter expertise.

What we see now in the Fit Model Platform with the standard least squares approach is something like this (yellow = main effects, quadratic effects, two factor interactions rated as necessary; blue = two factor interactions rated as if possible).

Usually, we would go ahead and remove the terms from bottom to top until only significant terms remain in the model. The issue here now is that, first of all, only very large Pvalues (besides the block) occur and secondly, the main effects are the factors at the bottom. So we can not go with the standard procedure “remove terms from bottom to top”. The effects hidden under the blue square are the “if possible” interactions.

What happens if we delete for example one of the “if possible” interactions is visually nothing. But when we have deleted 3 of them (doesn’t matter which ones), suddenly the model looks like this. So from here on we could basically go on with the “bottom to top” removal approach.

The question now would be how to proceed in generating the model. We can not just delete three of the “if possible” interactions since these ones might be significant terms, which is illustrated here as some of them (blue squares) remain in the model with significant Pvalues.

At this point we are quite confused, maybe you can give us advice on why this happens and how we can solve this issue?

Thank you very much for taking the time again, best regards,

C and A

Victor_G · Oct 9, 2023 5:36 AM

Hi @I-love-jmp,

Not directly answering your questions, but you might try to set up the Block factor as "Random", as you're more interested in variability change due to block (time and other uncontrolled factors) than a mean change response due to block (that you won't be able to reproduce).

When creating your model, click on "Block" and on the red triangle from Attributes to specify this term as a random effect :

This will create a Mixed model with Block as Random effect, and its significance will be evaluated based on the change on the response variability, not response mean. This should also help other factors to "express" themselves, as they won't be compared to block factor directly.

I will answer later about other questions (if still needed),

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation

Re: DoE Model issues Augmentation