Hi @Victor_G
Thank you for the detailed response. This is very helpful.
Just to clarify one point: in the script I shared, I had removed the X2_Status and X3_Status terms myself from the model terms because I was already concerned that they may be redundant. But your explanation confirms the concern more clearly: since X2 = 0 already implies OFF and X2 > 0 implies ON, adding a separate status factor would introduce redundant information and potential collinearity.
So, if I understand correctly, the more direct approach (Option 1) does make sense.
Based on your response, I am also thinking that three discrete numeric levels may be sufficient instead of four. For example, 0, lowest practical active level, and highest practical active level. This may be more appropriate because the distance between 0 → lowest active level and lowest → highest active level is not the same.
Following your suggestion, this is the design structure I think makes more sense. I removed the separate categorical status factors and kept X2 and X3 as discrete numeric factors, where 0 represents the OFF/absent condition and the non-zero values represent active levels.
The model includes main effects, a quadratic effect for X1, and selected two-factor interaction terms:
DOE(
Custom Design,
{Add Response( Maximize, "Y", ., ., . ),
Add Factor( Continuous, -1, 1, "X1", 0 ),
Add Factor( Discrete Numeric, {0, 15, 30}, "X2", 0 ),
Add Factor( Discrete Numeric, {0, 30, 45}, "X3", 0 ),
Set Random Seed( 1139260218 ), Number of Starts( 71985 ), Add Term( {1, 0} ),
Add Term( {1, 1} ), Add Term( {2, 1} ), Add Potential Term( {2, 2} ),
Add Term( {3, 1} ), Add Potential Term( {3, 2} ), Add Term( {1, 2} ),
Add Term( {1, 1}, {2, 1} ), Add Term( {1, 1}, {3, 1} ),
Add Term( {2, 1}, {3, 1} ), Set Sample Size( 12 ), Simulate Responses( 0 ),
Save X Matrix( 0 )}
);
One point I want to check is about model fitting and interpretation. Since 0 for X2 and X3 represents a real OFF/absent condition, and not just a coded low level, I assume I should be careful with automatic coding or centering of polynomial terms. My concern is that centering may make the model coefficients harder to interpret in relation to the actual OFF/absent state. Would you recommend fitting these terms using the actual numeric values, or is there a better coding strategy for this type of factor?