Discussions

sanch1 · Aug 20, 2024 08:37 AM

Hello,

I ran a Definitive screening design and generated a table by selecting "Add block with center runs to estimate quadratic effects. I collected the response data and added it to the table. But when I try to run "Fit Definitive Screening", I'm met with this error:

The Fit Definitive Screening platform only runs when I hide/exclude runs from block 2. Am I doing something wrong? How do I run a model with all of the runs included? Do I have do some kind of augmented design to include the second block?

Victor_G · Aug 21, 2024 8:27 AM

Disclaimer : I'm not a statistician, but rather use statistics in the most practical and pragmatic way. Sorry if my understanding is limited or my explanations not clear/correct, I'm constantly learning :):)

@statman I don't know which model you specify but we didn't use the same assumed model (full quadratic + block possible with a DSD). By specifying a full quadratic model, you would have a singularity for the interaction term X1.X2 being a linear combination of other terms (see file with added script).

You're right that you can still add as many terms as you want in the model here. But practically, how do you know which terms to subtract from the full model if you have biased estimates (zeroed, biased or without any std error estimation in your script), missing p-values and lack of metrics to guide your model selection/refinement ? You basically have no error estimation and reference for comparison/test ?

In your situation and assumed model, there is no indication about which term to subtract (no p-values, no std error for terms estimates, ...), and new users may keep the model as it is, as "R² is equal to 1", so it's a perfect model".
In my assumed model, only the main effects and block effect can be estimated and tested properly when fitting the full model to the data. But the rest of all the other terms can't be estimated properly, so it can become very tricky to refine the model backward from this situation.
Stepwise and other related methods are helpful to guide the user and test a lot of different models, but the models shouldn't be relied solely on statistical accuracy/fitting metrics. Domain expertise and validation runs are essential to confirm the reliability of a model.

Hope this answer clarify my response,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Aug 21, 2024 8:51 AM

Just a note, the only reason I posted was the title of the thread. It mentions workflow.

I'll refer you to Cuthbert Daniel (Daniel Plots aka normal and half normal plots) and G.E.P. Box (also adds Bayes plots) for methods of analyzing saturated models. Pareto plots where you indicate practical significance on the Y axis are also quite useful. Just leaving terms out of the model biases the MSE estimate and can lead to misinterpretation of the data.

Daniel, Cuthbert (1959), Using Half-Normal Plots in Interpreting Factorial Two-level Experiments, Technometrics, November, Vol. 1, No. 4

Box, G.E.P., Daniel Meyer, (1993), “Finding the Active Factors in Fractionated Screening Experiments”, Journal of Quality Technology, Vol. 25, No. 2, April

See how Dr. Box analyzes experiments this paper:

Box, G.E.P., Stephen Jones (1992), “Split-plot designs for robust product experimentation”, Journal of Applied Statistics, Vol. 19, No. 1

"All models are wrong, some are useful" G.E.P. Box

Discussions

Definitive Screening Design Workflow

Re: Definitive Screening Design Workflow

Re: Definitive Screening Design Workflow

Recommended Articles