Are the results of fit two-level screening valid with a Plackett-Burman which ma...

Report Inappropriate Content · Sep 27, 2023 06:01 AM

I wanted to design an experiment to screen 18 factors against one response. I now understand that there are more modern, better designs to solve this problem, but at the time I used a 20-run Plackett-Burman design. I completed each run in duplicate, for 40 runs in total. I am attaching a .csv of the anonymised results, where I have standardised the factor levels (subtracted mean and divided by standard deviation).

I understand that because Plackett-Burman is a saturated design, I can't try to fit a linear model because the number of terms means it is almost certain to be overfit, so I should use 'two-level screening' (although maybe this changes because I have duplicates?) . I used this option, adding all 18 factors A-R as Xs and the response as Y. Screenshot of results below:

I expected A,M and D especially to have an effect, but I would find it quite surprising if F and K, were to have significant effects. Therefore I thought I would check here if I am interpreting these results properly. I'm especially confused with the fact that JMP adds 'Null' terms as part of the two-level screening. This page in the documentation describes the rationale:

The process continues until n effects are obtained, where n is the number of rows in the data table, thus fully saturating the model. If complete saturation is not possible with the factors, JMP generates random orthogonalized effects to absorb the rest of the variation. They are labeled Null n where n is a number. For example, this situation occurs if there are exact replicate rows in the design.

I do have replicate rows in the design, but when I look at the plot above it seems to me that factors A and M are separate from the other factors, and that factors other than A and M only look significant because of the null terms being added.

Why would I question these results, apart from that it would be surprising if certain factors have effects? From the JMP documentation: "The analysis of screening designs depends on the principle of effect sparsity, where most of the variation in the response is explained by a small number of effects". I'm wondering if this is being broken here, if this many factors are coming out as significant?

Many thanks in advance for any help.

I'm using JMP Pro 17.0.0

Victor_G · Sep 27, 2023 6:44 AM

Hi @noahsprent,

Three principle guide the construction and analysis for (model-based) Design of Experiments:

Effect Sparsity : The effect sparsity principle actually refers to the idea that only a few effects in a factorial design will be statistically significant. Most of the variation in the response is explained by a small number of effects, thus it is most likely that main effects and two-factors interactions are the most significant responses in a factorial design. It is also called the Pareto principle.
Effect Hierarchy : The hierarchy principle states that the higher degree of the interaction, the less likely the interaction will explain variation in the response. Therefore, main effects may explain more variation than 2 factors interactions, 2FIs explain more variation than 3FIs, … so priority should be given to the estimation of lower order effects.
Effect Heredity : Similar to genetic heredity, the effect heredity principle postulates that interaction terms may only be considered if the ordered terms preceding the interaction are significant. This principle has two possible implementations : strong or weak heredity. Strong heredity implies that an interaction term can be included in the model only if both of the corresponding main effects are present. Weak heredity implies that an interaction term is included in the model if at least one of the corresponding main effects is present.

In a screening design like Plackett & Burman design, two of these principles are heavily emphasized : effect sparsity and effect hierarchy, since you only investigate main effects from a very large number of possible effects in the model.

With 18 factors, you may have in your model 1 intercept, 18 main effects, 153 possible 2-factors interactions, 816 possible 3-factors interactions, etc... So you may have a very large number of effects to estimate and very few ressources available, so you need to focus your efforts on the most promising/significant effects.

The Plackett & Burman design is a screening design that helps you leverage the most informative experiments based on the effect principles mentioned above.

Repeating the analysis you have done gives me similar results with different methods (see file attached):

However I wouldn't add at this stage any 2-factors interactions in the model like A*M, since this term is aliased with main effects already part of the model during design creation:

At this stage I would consider using only significant main effects in the model, and realize a design augmentation if you want to identify relevant and significant 2-factors interactions.

I hope this (long) answer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Mark_Bailey · Sep 27, 2023 12:53 PM

To clarify a misconception stated by @Victor_G, the interaction term A*M is not aliased with any main effects. An alias would be identical to another term in the model. Aliases are also said to be 'confounded.'
Aliases occur when two effects are perfectly correlated. This is not the case here. Entering aliases in the same model produces a 'singularity.' A*M can be estimated along with the other terms. A*M is correlated with other terms, but its estimate would experience inflated variance and, hence, longer confidence intervals and lower power.

Victor_G · Sep 27, 2023 02:11 PM

Ooops, I might have answered a bit too quickly without checking my answer, thanks for noticing the mistake @Mark_Bailey !

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Sep 27, 2023 03:24 PM

First, I have a question, what do you mean by "replicate rows in the design"? Are these truly replicates (independent and randomly run) or are they repeats (multiple data points for the same treatment acquired while the treatment combination is constant).

PB designs are intended to be screening designs for main effects (essentially assumes higher order effects are negligible). A saturated PB design would have a main effect assigned for every DF (e.g., 20 run PB would have 19 factors). The confounding in a PB is partial and very complex. They are not necessarily intended for sequential iterations, but to screen out insignificant factors efficiently. YMMV

Analysis of the results using normal/half normal plots (first introduced by Daniel) was for un-replicated experiments. It is a great way to analyze un-replicated designs as there is no biased MSE for statistical significance. For normal plots to work, the model must be saturated (every DF assigned). Since you have what may be randomized replicates, it is impossible to assign the DFs, so JMP assigns the null as place holders for those DF's. You might look at ANOVA, since you may have unbiased estimates of MSE.

"All models are wrong, some are useful" G.E.P. Box

noahsprent · Oct 17, 2023 3:01 AM

Dear all,

Firstly, thank you very much for your responses, and apologies for my delayed reply. I wanted to try to better understand some of the concepts that you mentioned before responding, and then had to prioritise another task. I have had a look at the papers from Lenth and Daniels and then the later paper from Larntz and Whitcomb extending this to designs with some replication. Forgive me my background is not in statistics! As I understand it:

It is useful to plot a half-normal plot and use Lenth's Method for determining which factors have a significant effect in saturated designs like PB. This works on the principle of effect sparcity that because only a small number of factors have an effect, the ones that do have an effect will be outliers on a propability plot. Lenth's method makes this visual inspection more quantitative.
Larntz and Whitcomb proposed that in the case where there are replicates, we can add these 'Null' terms in order to take into account the additional infromation that these provide, and that is what JMP does in order to saturate the model and plot the normal plot.
As I have many more replicates than just a few center points (to answer @statman's question: as I understand your definitions these would be true replicates; I duplicated the initial design, randomised the run order, and treated all 40 runs independently through the experiment), I should have enough information to analyse using ANOVA/linear regression rather than replying on plots and Lenth's method.

So I used 'Fit Model' in JMP, with only the main effects included, as @Victor_G pointed out (I think @Mark_Bailey is saying that the design would perhaps allow me to investigate interactions, but that I'd have lower power due to correlatons. At this stage I'm not really interested beyond main effects, however). I saw that the residuals in the "Residual by Predicted" plot appeared slightly heteroscedastic (the physical properties of the system that I'm investigating would support this, I'd expect variance to be a proportion of the response, rather than a constant variance for all responses), so I log10-transformed the response and reran 'Fit Model', and obtained the following ANOVA table:

(Reports for both the "Fit Model"s attached)

This seems to agree reasonably closely with the "Fit 2-level screening" platform results.

As @statman pointed out, PB is a screening design and should be used to identify factors which aren't significant so they can be removed in future interations. In my case it seems that most of my factors did have an effect, and warrant further investigation, assuming that the size of the effect has a real-world, and not just a statistical, significance.

Does this seem like a sensible approach to you all? Many thanks again for your help.

Best wishes,

Noah

EDIT: I should add that I did consider ANOVA/linear regression in the first instance, but was concerned about overfitting with the number of factors that I have, and thought that the screening platform might be better at dealing with this. I'm not sure if that was an incorrect assumption!

Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Re: Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Re: Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Re: Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Re: Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Re: Are the results of fit two-level screening valid with a Plackett-Burman which may violate effect sparcity?

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling

Analytics with Confidence 2: Models That Don't Generalise

Creating Heat Maps