QW, here are my thoughts:
1. The reason why you find significant factors after removing the insignificant terms is because you have biased the MSE. When you remove the insignificant term (which likely has smaller SS) you also take the DF's with each term. Since the MSE is the SS/DF, you reduce the MSE (perhaps unrealistically). The F-value is the MSfactor/MSerror. So if you unrealistically reduce the MSE, you inflate the F-value and subsequently lower the p-value. My word of advise is to ALWAYS consider what sources of variation are you comparing.
2. Normal/ Half Normal plots (Daniel Plots) are not necessarily straight forward on how to interpret. Russel's PSE is an attempt to help interpret the charts, but the PSE is not always useful. You have to consider which effects are a function of the random errors (should be a relatively straight line) and which effects are assignable. Fall outside the random distribution of errors. If your Normal plot has an S-curve shape or is broken, this is an indication of a significant noise effect (there may be more than one distribution of random errors). From your picture, it looks like the greyed out + at the top of the normal plot is significant (my guess is this is Column2(x)). It really helps to have the Pareto Plot with the normal plot. Assess both statistical signals and practical significance.
3. I think you have a good start to how to build your knowledge (and perhaps a mathematical model). I use the Practical>Graphical>Quantitative method:
First, how much variation in the data did you create in the experiment? Is the data of any practical value (from the SME perspective)? How does the data compared with what you predicted? How would it compare to your hypotheses used to determine which factors to include in the study? ANOG and MR charts are excellent for recognizing patterns correlating factor effects with the response variables. Never turn off engineering.
Second, graph/plot the data. JMP's graph builder can be very useful. Are there any unusual data points (treatments)? This is also where Normal Plots and Pareto plots are useful (though their usefulness is best with saturated models).
Lastly quantitative (effects, ANOVA, regression, et. al.).
4. Regarding model refinement with quantitative methods for DOE data: There are a number of statistics/plots we use to do this:
- R-square-R-square Adjusted delta (This is more important than either statistic on its own)
- RMSE (smaller is better)
- p-values (be careful of biased values, consider what is being compared)
- Residuals (to test NID(mean, variance) assumptions
In DOE, I don't recommend step-wise regression (this is an additive model building approach most useful for data mining or regressing on observational/historical data and of course you have an additional concern of collinearity). I recommend a subtractive approach (start saturated and remove terms). And your step 3 sounds great!
"All models are wrong, some are useful" G.E.P. Box