Here are my thoughts in general (specific advice would require a more thorough understanding of the situation):
First, I will assume you have an un-replicated factorial with 16 treatments. The model includes all possible terms to 4th order.
You bring up an important concept with respect to experimentation and the F-test (or any test perhaps). The typical significance test in experimentation is to compare the MS of the model term with the MSe (error). This is the F-ration or F-value. The important questions are: How was the error term estimated? How representative of the true error is the estimate? Are the comparisons that are being made useful and representative?
If you remove insignificant terms from the model (lack of fit), you are potentially biasing the MSe lower (small SS divided by DF). Hence when you compare the MS of the model term with the smaller MSe you get larger F-values (and smaller p-values).
A quick read:
https://www.additive-net.de/images/software/minitab/downloads/SCIApr2004MSE.pdf
I recommend first assessing practical significance of the model terms using Pareto charts of effects AND use Daniel's method of evaluating statistical significance for un-replicated experiments (Normal/Half Normal plots) and perhaps augmented with Bayes plots (Box). This will give you both practical significance and statistical significance without bias.
Daniel, Cuthbert, "Use of Half-Normal Plots in Interpreting Factorial Two-Level Experiments", Technometrics, Vol. 1, No.4 November 1959
Once you have determined the factors/interactions that are active in the experiment, simplify/reduce the model. The purpose of simplifying the model is 2 fold:
1. to get a more useful model for iteration and prediction
2. to get residuals to help assess model adequacy and whether any assumptions were violated.
Note: you do not re-assess statistical significance for the simplified model
"All models are wrong, some are useful" G.E.P. Box