Hi @MRB3855 ,
Thanks for your response, and I completely agree with you. I will try to be a little more clear about my questions regarding normality of the responses.
I actually don't care that the responses aren't normal -- they are what they are for several reasons that are known and not surprising. I don't believe I indicated that those responses which are non-normal should be normal. If I did, my apologies, as I don't think they should be anything. They are what they are, and there is currently no indication that any special cause would remove any data from the DOE.
Y__2 is non-normal because it is highly depended (in a nonlinear way) on Y__1. Y__3 is non-normal in part because physically it cannot have a value <0, and due to the design of the DOE, there are some measurements that are near this physical limit (it's also highly dependent on Y__1). While Y__4 on the other hand is in part non-linear because it is also highly dependent on Y__1 and has an upper detection limit of 280 -- after that, the results are maxed-out, and because of the design there are some runs where the response is maxed-out. This is all well-and-good. So far, to me, the DOE has done exactly what it was designed to do.
My questions/concerns regarding the normality of the responses is more about how to manage/deal with the non-normal response in the modeling step. Some resources in the JMP Community & Help that I've read seem to have conflicting recommendations, especially when it comes to a split-plot designs where at least one factor is hard to change. Some of the resources recommend that the analysis should keep the whole&random effects in there (hence you must use the mixed model method -- but you can't account for non-normal distributions in your response -- or the detection limit), and some suggest using the GenReg platform where you can account for the non-normal distributions (and detection limits), but you loose the random effects in the model.
There are obviously trade-offs to the different approaches, and I'm more concerned in minimizing how those trade-offs affect the analysis of the DOE -- when it comes to the normality of the responses. If I use the GenReg platform, I can't account for the random effects inherent in the split-plot design and therefore susceptible to both Type 1 and Type 2 errors -- I could conclude something is there when it's not, or conclude something is not there when it is. On the other hand, the mixed model can't handle the non-normal response distributions (but can handle the whole&random effects), which can sometimes lead to predictions that aren't physically real -- like values <0, which don't make sense.
So, how does one handle this in a real example? The sample data tables in JMP are nice and ideal, providing a clear analytical path, but they don't really address gray areas like this where a clear analytical path isn't so straightforward.
Ultimately, it would be helpful to have a model that has low error and that the residuals are normally distributed, and centered around 0. The mixed model platform results give this (but also give non-physical response predictions), whereas the GenReg platform does not, but does provide physically valid response predictions. Again, how does one handle/manage this in a real, non-ideal situation?
Keep in mind that I have several other more broad questions/concerns as well.
I hope this has helped to clear up some points, but also to direct discussion to the other more general questions like augmentation, or what are best practice approaches to managing analysis when your data doesn't fit nicely into one model or the other, or why does the profiler always suggest extreme settings for optimal responses, and is this an indication something larger is wrong?
DS