Victor I agree completely with your first step, but, I would like to propose an alternate to your step #2. You should not fit a full model as there are 2 different error structures (whole plot and sub plot). This would make for inappropriate comparisons. For example, comparing the WP factor(s) to the MSE of the subplot for statistical significance is comparing apples to oranges. That p-value is useless. You should essentially treat the whole plot and sub plot as if they are 2 different experiments. Please read Box and Jones. Also Anderson and McLean, Sanders, Bisgaard.
Box, G.E.P., Stephen Jones (1992), “Split-plot designs for robust product experimentation”, Journal of Applied Statistics, Vol. 19, No. 1
Jones, Bradley, Christopher J. Nachtsheim (2009) “Split-Plot Designs: What, Why, and How”, Journal of Quality Technology, Vol. 41, No. 4, pp. 340-361
Anderson, Virgil and McLean, Robert (1974) “Design of Experiments, A Realistic Approach” Marcell Decker (ISBN 0-8247-7493-0)
Sanders, D., & Coleman, J. (2003). Recognition and Importance of Restrictions on Randomization in Industrial Experimentation. Quality Engineering, 15(4), 533–543. https://doi.org/10.1081/QEN-120018386
Bisgaard, S. (2000). The Design and Analysis of 2k–p × 2q–r Split Plot Experiments. Journal of Quality Technology, 32(1), 39–56. https://doi.org/10.1080/00224065.2000.11979970
"All models are wrong, some are useful" G.E.P. Box