Hi @statman,
Thanks a lot for your input and comments, always instructive and thoughtful to guide new and experienced users.
I tend to agree with you about the Stepwise approach on the "theoritical" aspect: for non-(super)saturated designs, the assumed model should be the one you're starting with, before considering refining it based on statistical criteria and practical evaluation/validation.
With Definitive Screening Designs, the situation tends to be a little different compared to traditional designs, since you won't be able to estimate all possible terms that could be estimated and enter in the model ; in the situation here with 8 factors and 1 block, that would mean to estimate 1 intercept, 8 main effects (+block), 28 interactions between 2 factors, and 8 quadratic effects. The design used here uses 26 runs, so it can't estimate a full RSM model with the 46 terms mentioned before, so no possible backwards/subtractive approach possible without strong assumptions/simplification.
Hence the need for a specific analysis strategy, which is under the "Fit DSD" platform. If possible, the "Fit DSD" analysis is the recommended analysis, as it is a more conservative analysis strategy than Stepwise approaches, assuming factor sparsity and effect heredity principles hold true, estimating and fitting main effects first, before considering interactions and quadratic effects with effect heredity principle and estimating them from the residuals of the main effects model.
When Fit DSD is not possible (because of missing values, excluded rows, added replicates, ... anything that could destroy the foldover structure and prevent fom using the recommended analysis approach for DSD), then you have to find something else in practice. Stepwise may be an option (as well as Generalized Regression models, with "Two Stage Forward Selection", "Pruned Forward Selection" or "Best Subset" estimation methods with Effect Heredity enforced, but only available in JMP Pro), even if its "brute-force" and greedy approach may not be optimal in the context of designed experiments.
I particularly like the "All Models" option in the Stepwise platform (for limited number of factors and terms in the model), not to directly create in a brute-force approach the "best" model, but to guide the understanding and evaluation of several models, and choose the most likely active terms in the final model. This can be visualized through " Raster plots", introduced in the context of model selection for DoE by Peter Goos, proposed in the JMP Wish List : Raster plots or other visualization tools to help model evaluation and selection for DoEs
This visualization helps to identify the most likely active terms, and see where/how models agree or disagree. It can also help visualizing aliasing between effects. Example from a use case by Peter Goos :
@sanch1 At the end, "All models are wrong but some are useful", so it's always interesting to try and compare different modeling options, and even more when domain expertise can guide the process. Some methods are more conservative than others, but combining different modeling with domain expertise can help having a broader view about what matters the most. And from then, plan your next experiments to augment your DoE, confirm/refine/correct your model, and prepare some validation points to be able to assess your model's validity.
If you need more informations or are interested in diving deeper in the analysis of DSD topic, there are other ressources/posts that could help you :
I hope this complementary answer may be helpful,
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)