Hi @MJZ82,
You have an increase in the final p-values of the main effects because they are calculated with a different number of degree of freedom than from the first stage only. In the first stage, only main effects are studied, so the number of degree of freedom in the design is higher, hence the use of a threshold of 0.05 to detect main effects.
In the second stage, you already have used some degrees of freedom to include the main effects in the model, so higher order effects are included in the model with a higher threshold p-value (often 0.2) because there are less degree of freedom left to test them in the same way as main effects in stage 1. You can read more about the methodology here: Statistical Details for the Fit Definitive Screening Platform
In the final combined model panel, the p-values are calculated as if the terms from this model would have been entered simultaneously in a simple least squares model, hence different (and higher) p-values than calculated before in each of the two stages. The profiler displayed at the end corresponds to the final combined model.
I would not reduce the model further based on p-values from the combined model parameter estimates panel, as the methodology used for Fit Definitive Screening is based on a sequential approach : first identifying main effects influence on the response, then identify active 2nd order effects based on the residuals from the main effect model fitted previously. The p-values calculated with a simultaneous inclusion of the terms (like in the combined model parameter estimates panel) do not reflect what has been done to identify and enter model terms, so they are not reliable, may be biased and the decision to remove some terms should not be based solely on this information. It's also important to respect Effects Heredity when fitting models from DoEs, so do not remove a main effect of a significant interaction or quadratic effect of the same factor is in the model.
You can still try different models using the different modeling platforms available: The Fit Definitive Screening Platform, The Fit Two Level Screening Platform, Standard Least Squares Models, Generalized Regression Models, ... and compare the common terms identified by those methods and the one that differ. The topic of modeling is a lot more vast (and sometimes complicated) than "only" relying on p-values. Depending on your objective(s), you may have different paths to models evaluation and selection :
- Explainative model : In an explainative mode, you're more focussed on the terms that do have some influence on the response(s), so you might evaluate the need to include the different terms based on statistical significance (with the help of p-values and a predefined threshold for it like 0.05) and practical significance (size of the estimates, selection based on domain expertise). R², R² adjusted (and the difference between the two, which needs to be minimized) might be good metrics to understand how much variation is explained by the identified terms, and select relevant model(s) to explain your system under study.
- Predictive model : In a predictive mode, you're more focussed on the terms that help you minimize prediction errors, so you might evaluate the need to include the different terms based on how this improve the predictive performances, through the visualizations of actual vs. predictive plot, and size of the errors (residuals plot). RMSE might be a good metric to assess which model(s) have the best predictive performances (goal is to minimize RMSE).
You might also be interested by a combination of the two parts, so different metrics could be used to help you evaluate and select model's, like information criteria (AICc, BIC) that help find a compromise between predictive performances of the model and its complexity. To evaluate and select a model based on these criteria, the lower the better.
Hope this answer will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)