I wanted to design an experiment to screen 18 factors against one response. I now understand that there are more modern, better designs to solve this problem, but at the time I used a 20-run Plackett-Burman design. I completed each run in duplicate, for 40 runs in total. I am attaching a .csv of the anonymised results, where I have standardised the factor levels (subtracted mean and divided by standard deviation).
I understand that because Plackett-Burman is a saturated design, I can't try to fit a linear model because the number of terms means it is almost certain to be overfit, so I should use 'two-level screening' (although maybe this changes because I have duplicates?) . I used this option, adding all 18 factors A-R as Xs and the response as Y. Screenshot of results below:
I expected A,M and D especially to have an effect, but I would find it quite surprising if F and K, were to have significant effects. Therefore I thought I would check here if I am interpreting these results properly. I'm especially confused with the fact that JMP adds 'Null' terms as part of the two-level screening. This page in the documentation describes the rationale:
The process continues until n effects are obtained, where n is the number of rows in the data table, thus fully saturating the model. If complete saturation is not possible with the factors, JMP generates random orthogonalized effects to absorb the rest of the variation. They are labeled Null n where n is a number. For example, this situation occurs if there are exact replicate rows in the design.
I do have replicate rows in the design, but when I look at the plot above it seems to me that factors A and M are separate from the other factors, and that factors other than A and M only look significant because of the null terms being added.
Why would I question these results, apart from that it would be surprising if certain factors have effects? From the JMP documentation: "The analysis of screening designs depends on the principle of effect sparsity, where most of the variation in the response is explained by a small number of effects". I'm wondering if this is being broken here, if this many factors are coming out as significant?
Many thanks in advance for any help.
I'm using JMP Pro 17.0.0
Hi @noahsprent,
Three principle guide the construction and analysis for (model-based) Design of Experiments:
In a screening design like Plackett & Burman design, two of these principles are heavily emphasized : effect sparsity and effect hierarchy, since you only investigate main effects from a very large number of possible effects in the model.
With 18 factors, you may have in your model 1 intercept, 18 main effects, 153 possible 2-factors interactions, 816 possible 3-factors interactions, etc... So you may have a very large number of effects to estimate and very few ressources available, so you need to focus your efforts on the most promising/significant effects.
The Plackett & Burman design is a screening design that helps you leverage the most informative experiments based on the effect principles mentioned above.
Repeating the analysis you have done gives me similar results with different methods (see file attached):
However I wouldn't add at this stage any 2-factors interactions in the model like A*M, since this term is aliased with main effects already part of the model during design creation:
At this stage I would consider using only significant main effects in the model, and realize a design augmentation if you want to identify relevant and significant 2-factors interactions.
I hope this (long) answer may help you,
To clarify a misconception stated by @Victor_G, the interaction term A*M is not aliased with any main effects. An alias would be identical to another term in the model. Aliases are also said to be 'confounded.'
Aliases occur when two effects are perfectly correlated. This is not the case here. Entering aliases in the same model produces a 'singularity.' A*M can be estimated along with the other terms. A*M is correlated with other terms, but its estimate would experience inflated variance and, hence, longer confidence intervals and lower power.
First, I have a question, what do you mean by "replicate rows in the design"? Are these truly replicates (independent and randomly run) or are they repeats (multiple data points for the same treatment acquired while the treatment combination is constant).
PB designs are intended to be screening designs for main effects (essentially assumes higher order effects are negligible). A saturated PB design would have a main effect assigned for every DF (e.g., 20 run PB would have 19 factors). The confounding in a PB is partial and very complex. They are not necessarily intended for sequential iterations, but to screen out insignificant factors efficiently. YMMV
Analysis of the results using normal/half normal plots (first introduced by Daniel) was for un-replicated experiments. It is a great way to analyze un-replicated designs as there is no biased MSE for statistical significance. For normal plots to work, the model must be saturated (every DF assigned). Since you have what may be randomized replicates, it is impossible to assign the DFs, so JMP assigns the null as place holders for those DF's. You might look at ANOVA, since you may have unbiased estimates of MSE.
Dear all,
Firstly, thank you very much for your responses, and apologies for my delayed reply. I wanted to try to better understand some of the concepts that you mentioned before responding, and then had to prioritise another task. I have had a look at the papers from Lenth and Daniels and then the later paper from Larntz and Whitcomb extending this to designs with some replication. Forgive me my background is not in statistics! As I understand it:
So I used 'Fit Model' in JMP, with only the main effects included, as @Victor_G pointed out (I think @Mark_Bailey is saying that the design would perhaps allow me to investigate interactions, but that I'd have lower power due to correlatons. At this stage I'm not really interested beyond main effects, however). I saw that the residuals in the "Residual by Predicted" plot appeared slightly heteroscedastic (the physical properties of the system that I'm investigating would support this, I'd expect variance to be a proportion of the response, rather than a constant variance for all responses), so I log10-transformed the response and reran 'Fit Model', and obtained the following ANOVA table:
(Reports for both the "Fit Model"s attached)
This seems to agree reasonably closely with the "Fit 2-level screening" platform results.
As @statman pointed out, PB is a screening design and should be used to identify factors which aren't significant so they can be removed in future interations. In my case it seems that most of my factors did have an effect, and warrant further investigation, assuming that the size of the effect has a real-world, and not just a statistical, significance.
Does this seem like a sensible approach to you all? Many thanks again for your help.
Best wishes,
Noah
EDIT: I should add that I did consider ANOVA/linear regression in the first instance, but was concerned about overfitting with the number of factors that I have, and thought that the screening platform might be better at dealing with this. I'm not sure if that was an incorrect assumption!