Hi @lazzybug,
I might have answered a little too quick on the random block effect and the suggestion to remove it, sorry for that.
A general question before my comments/answers : do you know the true model behind Y (since you were talking about simulated results) in order to compare the different models with the "true" simulation model ?
The random blocking effect helps you to reduce noise in the analysis by creating blocks where the variability within a block will be small. This process helps you determine and sort out more easily significant effects.
- On a "conceptual" level, since you have designed your DoE with it, it seems more relevant to keep this random block effect.
Even if non-significant, your random block effect seems to be quite effective at removing noise from the experiments, as it captures 70 (in your first model) to 83% (in the second model) of the total variance trough this block effect. So even if non-significant, you certainly gain more precision in your model by keeping it (this can be seen also comparing RMSEs of your models 1, 2 and 3).
- On an "analysis" level, keeping your random block effect helps you reduce variance in your model (and predictions).
What are the consequences of removing the random blocking effect ?
- The unexplained error in the model will be larger (as you can see looking at the RMSE of your model 3 compared to models 2 and 1),
- Therefore, the test of the treatment effect might be less powerful, so you might miss/remove some significant effects due to the increased "noise"/error (that's why you may end up with a more parcimonious model in model 3).
- The parameter estimates values and standard deviations may change (this is also seen when adding/removing an effect in the modeling).
In this present case, unless you have the true simulated model for Y to know which model performs best, it is hard to choose one model in favor of another, since they are very close from each other (looking at which effects are in the model and their parameter estimates). You can also compare the different prediction profilers (see script "Comparison of models - Profiler" in the datatable), they seem to be in accordance about importance of factors and what level for each factor seem to be the optimum. There are some differences on the curvatures of some factors (for example X1 has some curvature in model 3, whereas X3 has some curvature in model 1 and 2 with random block).
I would still prefer model 2, as it takes into account random blocking and seems to have better "performances", but this may not be the definitive choice.
At the end, let's remind from Georges Box that : "All models are wrong but some are useful". The choice of the model should also be done in consideration of its "usefulness" and in accordance with domain expertise.
If in doubt, you can still add some validation runs that may help you see/compare which model(s) generalize better than the other(s).
Sorry for this "non-deterministic" answer, but I hope this will help you,
Happy New Year !
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)