Discussions

NominalGemsbok3

As part of a master’s thesis, a Definitive Screening Design with six continuous variables was conducted. We generated a standard design in JMP with 17 runs (+3 additional runs at the zero level).

The results surprised me in that a considerable number of interaction and quadratic effects are highly significant. My initial suspicion was overfitting, but I cannot find any indications supporting that. We have an excellent R-squared (which is expected), no elevated VIFs, an unremarkable Durbin–Watson test, etc. Furthermore, a PRESS value of 0.044 is achieved.

These findings remain the same or very similar even when strict heredity is disabled in the DSD analysis. In that case, 2–3 additional interaction or quadratic effects appear, which are again highly significant.

Based on everything I know, I currently see no reason to doubt the validity of the results. Or am I overlooking something?

NominalGemsbok3 · | Posted in reply to message from statman 04-14-2026

Could it be that the difference arises because this is a Definitive Screening Design, and I used the function DoE → Definitive Screening → Fit Definitive Screening in JMP for the analysis, while you did not? I believe that DSDs should definitely be analyzed using the appropriate method.

frankderuyck · | Posted in reply to message from NominalGemsbok3 04-15-2026

Be aware that in a DSD many interaction effects are correlated (highly aliased) R > 0,7 cfr. below pink fields and Design evaluation - Color map on Correlations - in annexed table. When you have 5 active factors you will need to augment the DSD and add extra runs to de-aliase interaction effects.

statman · | Posted in reply to message from NominalGemsbok3 04-15-2026

If you run the Fit Definitive Screening Design and select "Make Model", what do you end up with? You realize the p-value significance is based on the replicated center point.

"All models are wrong, some are useful" G.E.P. Box

frankderuyck · | Posted in reply to message from statman 04-15-2026

In the DSD result you can find 5 significant pure effects and also a couple of potential significant but correlated interaction effects which need to de-aliased by adding extra runs

frankderuyck · | Posted in reply to message from frankderuyck 04-15-2026

In attached table DSD analysis using the fit DSD platform; two significant correlated interaction effects are detected: (1) factor1 * factor3 corralated with factor2*factor5 and (2) factor4*factor5 correlated with factor1*factor2.

Remark as you notice in the pink fields in the correlation color map there is also aliasing with factor6 2nd interaction effects; however as factor 6 is not an active effect I did not take its interactions into account; a very good model can be constructed with iteractions from the 5 significant main effects.

Victor_G · Apr 16, 2026 4:04 PM

Hi @frankderuyck and @NominalGemsbok3 ,

There is no ultimate best model, for multiple reasons: choice of performance metric, threshold (for p-value for example), estimation method, etc ... And there are not enough unique treatments (degree of freedom) in a DSD to estimate all effects, so you can easily end up with different but competing models with good performances. You could see a DSD as a supersaturated design type for response surface model.

As @frankderuyck mentioned, due to the presence of partial aliases/correlations between interaction effects, and also between quadratic effects due to the design structure, you can't be 100% sure about the "real" impact of interaction effects and quadratic effects on your target that are detected (by any modeling methods), unless you add runs to better inform your model. You can have however more confidence about main effects, as the design structure avoids having any correlation between main effects and between main effects and higher order effects, so you can estimate them without any bias.

"All models are wrong, but some are useful"
I tried to create a specific visualization called raster plot (see Raster plots or other visualization tools to help model evaluation and selection for DoEs to see how it has been created) on this example to show this multiplicity of models due to the combinatorial explosion of possible terms included in the model (besides the intercept, there are 27 possible effect terms: 6 main effects, 15 two-factor interactions and 6 quadratic effects to choose from), using the platform Stepwise and the option "All Possible Models", (up to 10 terms in the model with strong heredity assumption). Here is the result of the models, sorted by Rsquare value, which shows which terms (in columns) are included for each model (each line):

I prefer using an information criterion for comparing multiple models, such as AICc (the lower the better), as it penalizes the use of too many terms and allow a better comparison for models with different complexities:

As you can see, most of the models do agree on the presence of the main effects of the first 5 factors. For factor 6, the results are different and there is no obvious pattern of presence of this main effect. For interactions and quadratic effects, it's also hard to see some strong patterns, except that some higher order effects don't seem to be included most of the time: interactions factor 1 x factor 6, factor 2 x factor 4, factor 3 x factor 5, factor 3 x factor 6, factor 4 x factor 6 and factor 5 x factor 6. For quadratic effect, factor 6 x factor 6 is absent most of the time in models. If we zoom in a little on the best models according to Rsquare value, there are some interesting observations on higher order effects :

Interactions Factor 1 x Factor 3, Factor 4 x Factor 5 tends to be often chosen in models. Moreover, quadratic effects for factor 2 and factor 4 are also often selected. These results tend to agree with the results I obtained from the Fit Definitive Screening platform, with the same main effects and higher order effects detected:

When limiting the comparison to three different estimation methods, you can also see this situation of different and equivalent models and terms combination. For example with Fit Definitive Screening, GenReg Normal Pruned Forward and GenReg Two Stage Forward estimation methods, we can compare both the performances of the models and the terms included:

Performances: here with Rsquare and Rsquare adjusted for explainative purposes (how much the model explains the variability in the response):

We can see that the first two methods show similar performances.

Terms in the models:

Even if the two first estimation methods provide models with similar methods, the terms included for higher order effect are different. They only agree on the inclusion of interaction effect Factor 1 x Factor 3.

So a reasonable follow-up would be to discuss with domain experts about which model(s) are the most sensible/reasonable, and use the platform Augment Designs to confirm and/or precise the most relevant model. You can for example augment it and specify the model for which you want to estimate the terms.

Please find the table with all scripts used in my response.
Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

frankderuyck · | Posted in reply to message from Victor_G 04-16-2026

A DSD still is a screening design appropriate to screen out strong effects - particularely main effects - from a larger set of potential factors; it will also give an indication for potential quadratic effects and interactions be aware that the latter are correlated (!) cfr. color map on correcaltions. If you get three or less active effect the DSD will yield a RSM. But! if > 3 significant effects you will need to augment the DSD to determine pure interaction & quadratic effects. Therefore, if after brainstorming with experts, there arer probably interaction effects I always start with minimal or low #DSD sufficient to detect strong effects and there are enough runs left in budget for augmentatio and de-aliasing. Also I would not spent too much effort on center point replication.

frankderuyck · | Posted in reply to message from frankderuyck 04-17-2026

Off the record: Strange that all 5 active effects don't show up in the half normal plot, only three effects show up?

statman · Apr 17, 2026 6:56 AM

You have to be careful evaluating the normal/half normal plots (See Daniel). You can't always use Lenth's PSE. If you look at the data, there appears to be more than one distribution of errors. Notice the bottom 6 values and then the break, then 2, then another break then 5. This is indicative of a change in noise during the experiment. It might be considered evidence of a special cause during the experiment. Look at the Pareto chart that coincides with this half normal plot. You will see the "grouping" of estimates more easily.

On a side note...replicating center points can be an excellent way to estimate the MSE. CP's can be used to test the linear assumption quite efficiently even though it is not specific. If you run enough of them randomly (~8) throughout the experiment, they can also assess stability over the experiment. Plot the MR in run order. Also If they are, for example, current conditions, they allow to set levels bolder on either side of current to get better directional insight. The only problem is the factors must be continuous.

"All models are wrong, some are useful" G.E.P. Box

frankderuyck · | Posted in reply to message from statman 04-17-2026

Interesting, how then use this plot better to detect active effects?

Discussions

"Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Re: "Surprising" results in an DSD-Design

Recommended Articles