My initial reaction is why are you testing at 4 levels if you are doing screening? The objective of screening designs for the most part is to examine a large number of factors in an efficient number of treatments. This is done using the principles of scarcity, hierarchy and heredity of effects. A 4-level factor allows for estimation of effects of hierarchy beyond what screening designs are intended to do. Can you pick the extremes of the 4 levels (which grades do you think will be the most different?) and test that factor at 2-levels?
"All models are wrong, some are useful" G.E.P. Box