Hi @OrdinaryShark22,
It's a good idea to use AICc/BIC as a metric to evaluate and compare models, but it shouldn't be the sole basis for model selection in my opinion. A metric always focus on one part of the model fitting informations, so it's best to use and compare several relevant metrics.
Response Surface models are optimization designs, where the emphasis is on predictive performances/optimization. There are several metrics that can help you figure out which model(s) doe seem to be interesting to consider :
- R²/R² adjusted (explainative metric) : It's interesting to consider R² and R² adjusted simulatenously, as you may want to know how much variability can be explained by the model. The higher the R², the higher variability part you're explaining through your model. The lower the difference between R² and R² adjusted, the better is your model fit : R² can increase by adding more and more terms in the model (even if not relevant/important), while R² adjusted has a penalization term for the number of terms in the model. So the lower the difference between these terms, the better fit and amount of terms you have in the model.
Looking at your two models, it seems the "reduced model from original script" tends to do a better choice than the other: higher R² and R² adjusted, and low difference between R² and R² adjusted (similar to the second model).
- RMSE (predictive metric) : As you assumed a response surface model, it's interesting to also consider predictive performance of the model. RMSE (Root Mean Square Error) is one possible metric to use to evaluate and compare predictive performances of your models. The lower the RMSE, the better the precision and predictive performances.
Looking at your two models, it seems the "reduced model from original script" tends to do a more precise modeling than the other : you have lower RMSE (0,82 vs. 1,16).
Then you have Information Criterion metrics, like AICc and BIC, which helps compare models by balancing likelihood with model complexity. In your case, these criteria are higher for the first model, implying more complex models. However, the values for BIC between the two models are very close (82,5 vs. 79,1), so there is no strong benefit of choosing one vs. the other based only on this metric.
You could also use p-values for terms selection, but again this metric shouldn't guide solely your decision as RSM focus on predictive performances, so creating a RSM with few factors imply that you may already have filtered out non-important terms in a previous screening phase. Since you're in an optimization phase, I would avoid completely removing a factor and all of his related terms, which is the case in model 2 : you have removed main effect B and all effects containing B, which seems to be a pretty "aggressive" decision for a model fitting with few factors and a focus on optimization and predictive performances.
Finally, to help your choice, please check the assumptions behind the use of linear regression models :Regression Model Assumptions | Introduction to Statistics | JMP
In your model "reduced model from original script", there doesn't seem to be a pattern in the residuals :
However, in your more simple model, the residuals look a little more suspicious, with some trend and possible curvature for possitive residuals values :
At the end, always use Domain Expertise to guide model evaluation, comparison and selection. Statistical metrics are here to help filter out non-relevant models, but the remaining models should be useful and make sense for domain experts.
In your case, you can use both models and see how the conclusions may differ : if you try to minimize the response, using the Profiler and trying to minimize the response leads to the same factors values : A=0,6 / B=30 / C=28 / D=12:15. You might want to run a validation point at these settings to see if the optimum found is relevant, no matter the model chosen. You already have the lowest response value at these A,B,C settings at row 8, which is a good indication that model may be different, but agree on the optimization.
I hope this answer will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)