Hello @Nafis1991,
Ok, the values for the response make a lot more sense now and I'm able to reproduce your results. So based on these values and on your questions, I think you might be interested in general on how to validate your regression model, and what to take care of. There are great ressources about this topic to dive deeper :
To check your model fit and assumptions, you can take a look at :
- Linear relationship between variables and response : Looking at the "Residual by Predicted Plot", there is no obvious pattern showing a non-linearity in the residuals (constant variance, approximately normally distributed (with a mean of zero), and independent of one another) :
- Normal distribution of errors : You can also check normality of the residuals with the plot Residual Normal Quantile Plot (accessible through red triangle, "Row Diagnostics", "Plot Residuals by Normal Quantiles"). Again, you don't have in your example an obvious pattern or strong deviation from normal :
- Homoscedasticity of errors : You can check this assumption with the "Residual by Predicted Plot" seen earlier. In your case, there is no evident sign of heteroscedasticity.
- Independence of the observations : There might be several ways to check for correlations or multicollinearity in your data. You can use the platform Multivariate (accessible in menu Analyze, Multivariate Methods, Multivariate) to check if you have correlations between your factors. As your data was generated by a DoE without any constraints or covariables, you have perfectly independent factors without any correlations :
You can also check multicollinearity by displaying Variance Inflation Factor (VIF) of the terms in your model : go into the "Parameter Estimates" panel, right click on the table displaying results, click on "Columns", and then choose "VIF". High VIFs indicate a collinearity issue among the terms in the model. While multicollinearity does not reduce a model's overall predictive power, it can produce estimates of the regression coefficients that are not statistically significant, as the inputs are not independent. In your case, nothing to worry about, VIFs are equal to 1, so effect terms are not correlated :
As mentioned earlier, the Lack of Fit test can be helpful to assess if your model fits the data well or not. If not (indicated by a statistically significant p-value), this may be an indication that a term could be missing in the model (like an higher order term, interaction or quadratic effect), or that some terms might have incorrectly been added in the model.
There are several metrics and statistics helping you assess if your model seems reliable and relevant, depending on your objective : Statistical significance of the model (Analysis of Variance (jmp.com)), Summary of Fit (with R², R² adjusted, RMSE), and possibly other indicators accessible through other platforms (like Information criteria with Generalized Regression platform)...
"All models are wrong, but some are useful" (George E. P. Box) :
Note that you can find several equally fitting models, so using Domain Expertise in combination with Statistics can help you compare and select the most interesting model(s). Just for illustration, I have saved 2 scripts in your datatable showing 2 similar but different models (with their prediction formula added in the table):
- "Fit Least Squares Area of Sticking 2 [Full Model]" : Model with all terms (main effects, 2-factors interactions and one 3-factors interaction).
- "Fit Least Squares Area of Sticking 2 [Reduced Model 1]" : Model with all main effects and 2-factors interactions. No 3-factors interaction.
You can see that results from these two models are very similar, so domain expertise can help you sort out the models, or you could continue with the two models (or more).
Hope this (long) answer will answer your questions (and probably more),
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)