Hi @frankderuyck,
SVEM analysis works great in two situations :
- To build predictive model with very limited data, by stacking the different individual models that have been trained and validated on slightly different data (different weights for training and validation).
- To assess practical significance and increase the precision of effects estimates for unique individual experiments (no replicates) in a Machine Learning way, thanks to models stacking.
No matter the situation, it's important to remember in which context SVEM technique has been introduced : to provide a validation technique/framework in the presence of small dataset without any other possible validation technique.
Talks and presentations : Re-Thinking the Design and Analysis of Experiments? (2021-EU-30MP-776) - JMP User Community
SVEM: A Paradigm Shift in Design and Analysis of Experiments (2021-EU-45MP-779) - JMP User Community
So just conceptually, I have mixed feelings about the idea of combining SVEM and replicates, as this looks like the merging of two different model building methodologies that rely on different concepts, objectives and validation methodologies : screening based on statistical significance and error/noise estimation for designs involving replicates and statistical modeling vs. predictive modeling based on predictive accuracy and bootstrapping/models stacking.
In presence of replicates, I fear that data leakage may occur when using SVEM for model building : as each experiment will have randomly (and in an anticorrelated way) training and validation weights, replicate runs could have different emphasis on training and validation, even if they represent the same treatment ! So the model building may be slightly biased (as it has information on the training and validation sets through replicates with different weights), and may result in "overconfidence" and a big decrease of standard deviation for terms estimates :
When increasing the variability between replicates on the same example (Reactor 20 Custom JMP dataset with added replicate runs up to 30 runs vs. original Reactor 20 Custom dataset), you can see that the two situations of using SVEM on design with replicates and using SVEM without replicates are fairly similar in terms of estimates and standard errors :
So again, I don't see what could be the benefit of merging these two techniques, as you're merging two (very) different validation techniques, which is not why SVEM has been created for.
I'm also skeptical about using SVEM to estimate within replicate variation, as you can already have this information if you have replicates with a standard analysis, and using SVEM might decrease the estimation of standard deviation, like bootstrapping : see the great interactive visualization for bootstrapping : Seeing Theory - Frequentist Inference
For other interested in this discussion, here are the recordings of DoE Club Q4 : Recordings DOE Club Q4 2024
Hope the response make sense and interested in the inputs from other members,
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)