Hi @frankderuyck,
I agree with the "philosophical" point of view from @statman concerning Stepwise regression, more oriented for "non-designed" datasets in order to uncover some factors and effects in the absence of a-priori model. Traditional model selection techniques using p-values are not useful for mixtures : due to multicollinearity between estimated effects (aliases between effects), standard errors of the estimates are quite large, resulting in misleading and distorted p-values.
For (model-based) mixture designs, Forward Stepwise regression may be possible, but with safeguards and caution (for example, force main effects in the model and no intercept). Two options may be interesting to consider for Mixture designs, and highlighted by Dr. Philip J. Ramsey in one of his presentation, "Analysis Strategies for Constrained Mixture and Mixture Process Experiments Using JMP Pro 14" :
- Traditional Forward selection using the pseudo factor method of Miller
- Traditional Forward selection using fractionally weighted bootstrapping and auto-validation: SVEM in JMP Pro, Generalized Regression platform (Gotwalt and Ramsey)
Dr Ramsey does not recommend the use of "All Possible Models" for mixtures in part due to the need to force the pure component terms in every model. From a pragmatic and practical perspective, "All possible models" method can be highly demanding in terms of computations (and not very effective), as all possible models with various number of effects are constructed, no matter the hierarchy and heredity between effects. So there is a quite large portion of the models created that may be not very interesting (and relevant) to consider, but that are still created and evaluated in an agnostic and "brute-force" way.
Since there is an a-priori model assumed for model-based mixture designs, backward regression is safer to use with a validation method based on information criterion like AICc (Generalized Regression platform in JMP Pro, or Backward Stepwise in JMP).
Since you're more interested in predictive performance than in factors screening, you can also do "manually" the backward regression, by starting from the full model with all the supposed effects from the model (JMP, Standard Least Squares), and remove terms as long as it helps RMSE (prediction errors) of the model to decrease.
For model-agnostic mixture designs (like Space-Filling designs), the use of Machine Learning methods, efficient and effective in interpolation, may be very useful to build a predictive model on the homogeneously distributed points (but overfitting may happen quickly) : SVM, Neural Nets, k-Nearest Neighbors, Gaussian Process, ...
At the end, validation runs may be necessary in order to validate and estimate the predictive performances of the model.
I hope this additional response will help you,
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)