Hi @AAYH,
First, is this DSD linked to the problem of RMSE change with validation data or is this different and not particularly related ? Just trying to see if this is the example dataset to illustrate your problem, or something in addition of your problem. What is your objective with this DoE ? There is a missing value in your datatable for the second uncontrolled variable.
If I understand well, you have added a blocking factor (in the DSD creation you have created it as a categorical factor ?), and 2 uncontrollable variables you have recorded but are not randomized/controled in a DoE way.
As you already have seen it, you have strong correlations between your two uncontrollable variables, and between Block and uncontrolled variables 1/2, which may greatly impact the predictive performance of your model on new data (script "1. Multivariate" in datatable attached).
I have added some column properties on your block random factor and uncontrollable variables (so that they can be used in JMP platforms correctly), and then realized two models :
- One with the Mixed model personality, specifying "Block" as a random effect (script "2.a. Fit Mixed"),
- One with the Standard Least Squares personality with "Block" as a random effect (script "2.b. Fit Model (with random Block)").
Not surprisingly, the two models provide similar results, and as you expected, uncontrollable variables 1 and 2 seem to have an important effect on the response Quality. The random block factor doesn't seem to be significant.
The models in itself are quite correct (depending on the precision of the model you expect), with high R² (0,87) and quite low RMSE (around 0,5). However, as already seen in the Multivariate platform (correlations), some of your factors are linearly dependent of each others, which creates high VIF (around 20 for the two uncontrolled variables) and may result in variance increase in predictions:
One option to reduce this collinearity could be to create principal components out of these 2 variables (script "3.a. PCA of uncontrolled variables 1&2"), and realize the same type of model as before (but replacing uncontrolled variables 1&2 by their 2 principal components, script "3.b. Fit Model with PCs") :
This won't change the performances of the model in terms of R² and RMSE, but decreasing the VIFs help reduce the variance in parameter estimations (for parameters involved in this collinearity situation).
Concerning your initial question, whether you can use the data you have now, it all depends of your objective (explanation and/or prediction), and how you evaluate the "usefulness" of your data (prediction precision ?).
From this first datatable, there are already some important things to notice, but without further informations, it's hard to interpret or conclude on the use of the data (representativeness of the ranges of uncontrolled variables 1 and 2, or correction needed for the missing value for example...). But it's an interesting first step on which you could augment your design, to better take into account your 2 uncontrolled variables and make them controlled by being factors in your experimentation.
Note that what I have done may not be the best approach depending on your objectives and needs, and that other options could be also available for the same tasks.
Victor GUILLER
Data & Analytics