Discussions

SDF1 · Jun 6, 2025 03:16 PM

Hello JMP Community,

Running JMP Pro 18.1.1 on Windows 11. Apologies ahead of time for the long post, but I want to try and provide as much information ahead of time. The references to factor and response columns are for the attached anonymized data table for those interested in looking in more detail (hence some vagueness in my description and anonymized data table).

I have some general concerns and questions to bring up regarding a split-plot DOE that I'm helping my colleagues to analyze. Some not too dissimilar from the recent post here. Unlike the "clean" and "nice" examples in the JMP help data, this is a real-life industrial example where things don't always work out in an ideal way.

Background: I have some chemist colleagues that wanted to run a DOE by testing out six factors and measuring four different responses. The purpose of the DOE was to hopefully determine what combination and at what levels those six factors could/should be set in order to achieve the desired results. One of the responses is, what I would call for lack of better words, the "primary" response, Y__1, (the other responses have dependencies on this "primary response"). The dependency is NOT linear for each of the other responses.

DOE design: We have a 30-run (this number was determined as the allowable number of runs by the chemists) custom DOE with six factors, one of which is hard to control (oven temperature), hence turning what would otherwise be a random design into a split-plot design.

X__5 is the hard to change oven temperature response.
Runs are grouped into 4 whole pots with 7 or 8 runs within each whole plot.
Three of the four responses have lower design limits, with a goal to maximize their response (Y__1, Y__2, Y__3).
One response has an upper design limit, with a goal to minimize its response (Y__4).
One response has a lower detection limit of 10 and absolute lower limit of 0 (Y__3).
One response has an upper detection limit of 280 (Y__4).

Observations of data: The DOE was conducted and here are some observations of the data.

Y__1 is normally distributed. Recall this is also the "primary" response.
Y__2 is NOT normally distributed, but is best fit by a SHASH distribution. JMP does have a SHASH transform function to turn it into a normally distributed data set as well as an inverse SHASH transformation. Y__2 is not normal because it strongly depends on Y__1, and as Y__1 changes (decreases), Y__2 increases, goes through a maximum and then begins to decrease.
Y__3 is NOT normally distributed because of the physical limit of 0 and detection limit of 10. This data is best characterized with a log-normal distribution. If using that as the Distribution in the GenReg platform, then the 0s need to be recoded as something very small and close to, but not 0, like 1e-12 or something.
Y__4 is NOT normally distributed and is also better characterized by a log-normal distribution (using the updated continuous fit functions that can deal with detection limits vs the legacy fitters that suggest a normal 2-mixture instead).

Discussion points/things to consider: Below are just some topics and discussions points that have either been brought up to me about the DOE results, or that I'm wondering how to manage/work with as I analyze the data.

As was mentioned in the post linked above, because this is a split-plot DOE, it's important to keep the whole plot & random effects in the model because it's not a truly random DOE. Removing it can potentially lead to erroneous conclusions.
- Other platforms like Generalized Regression and Generalized Linear Mixed Models can't handle the random effects, but it can manage the whole plot as a factor.
  - Does this mean I should stay away from using those platforms as a way to generate alternative models?
- Unfortunately, the only modeling platform that can handle the problem of detection limits for two of my responses is GenReg by using a Censor column. The others can't manage this, nor can they manage the non-normal distributions of 3 of my responses.
  - NOTE: the censor column for Y__3 is different from Y__4 because the detection limits are different.
- This results in a dilemma: use the Mixed Model (or SLS) approach that handles the whole & random effects, but can't manage the non-normally distributed data or the detection limit problem; or use an alternative platform like GenReg which can handle the censored data and the detection limit problem, but can't handle the whole & random effects of the split-plot design.
When running either the standard least squares or mixed model platforms with the data (or even GenReg and GLMM), all model profilers suggest that in order to maximize Y__1, Y__2, Y__3 and minimize Y__4, the factor settings should be set to an extreme value (either low or high -- it changes depending on the platform). This doesn't make sense from a domain knowledge perspective. Based on the ranges chosen for the DOE, we anticipated some/or all of the factors to be between the extremes for optimal responses.
- At present each response is being treated equally (25%, 0.25) in the set/maximize desirability options. This can change, but doesn't have a large effect on the profiler outcome -- it still suggests setting the factors at extreme values most of the time.
The chemists have proposed augmenting the DOE by adding center points. I am not sure this would solve the problems we are facing in the analysis. But anyway,
- I have tried to see if this is a possibility, but when I try this, I have to include the whole plots as a factor, and my only Augmentation Choice is "Augment", all the others are grayed out.
  - Why do I not have the other options available?
  - Is there a way to add center points or do a space filling without adding some kind of bias to the DOE?

Questions/issues I'm struggling with: Overall, here are the issues I'm trying to manage when analyzing the data -- particularly in trying to figure out and understand how to avoid the profiler from suggesting extreme settings for the factors.

Should I stick with the Mixed Model platform because of the split-plot design no matter what?
Is it at all useful to try and use other platforms (GenReg, GLMM, etc.) to try and analyze the data?
- If it is worthwhile, how best to try and include the whole plot & random effects inherent in the DOE?
What are some best practice methods for managing the strong dependence Y__2, Y__3, and Y__4 have on Y__1 in the analysis?
What are some best practice methods for managing the non-linear responses of Y__2, Y__3, and Y__4? Transforming them, or is there some other/better way?
I'd prefer to get a model for all responses at once, like how SLS can do it, but I don't think that platform is the correct platform to use given the non-normal distributions as well as dependence of the other responses on Y__1. I can save the prediction formulas for the responses from whatever platform I use and then use the Graph > Profiler to generate a prediction profiler, but I'm still stuck with the profiler suggesting extreme settings for the factors, which doesn't make sense.
- Why is the profiler suggesting extreme values?
- Does this indicate a problem with the DOE? If so, what is/are the problem(s).
- Can the extreme profiler suggestions be resolved somehow?
If augmenting the design by performing more experiments is one way to go, how can I access other augmentation options? Right now, all I can do is change the upper/lower factor settings and define the number of additional runs, I can't do anything else.

Thank you for taking the time to read through this post. Any feedback/thoughts/suggestions are much appreciated.

Thanks!,

DS

MarkovHedgehog9 · Jun 8, 2025 12:27 AM

how are you ,Y__4 because has an upper design limit ??/

SDF1 · Jun 9, 2025 07:52 AM

I don't understand your question.

statman · Jun 8, 2025 10:33 AM

Your discussion is way more than a discussion about split-plots (response variables have nothing to do with split-plots). Did you read the paper re. split-plots I suggested in the discussion you linked?

"All models are wrong, some are useful" G.E.P. Box

Dan_Obermiller · Jun 8, 2025 07:41 PM

@statman is correct that this is way more than a discussion. One thing that I will point out is that the distribution of your response variables is irrelevant. Your response data should have signals in it, which would alter the distribution. The fact that some responses have a non-normal distribution should not be a surprise. The residuals (after the model is fit and signals are explained) are what need to be normally distributed.

Dan Obermiller

SDF1 · Jun 9, 2025 08:54 AM

It may be a complicated and detailed discussion, but still a discussion nonetheless.

Semantics aside, I find it interesting that you mention the distribution of the response is irrelevant -- this seems to be counter to what was discussed by other JMP colleagues during a Discover Summit back in 2021 or 2022. In the presentation, there was a discussion about the use of the new detection limit column property for JMP 16. In the presentation, there was quite some commentary about the non-normal distribution of the response and how this needed to be properly accounted for by using a log-normal distribution when modeling the data with the GenReg platform. So, what is the correct approach -- do we treat the non-normal distribution of the response as irrelevant, or do I take it into consideration? It's hard to see how both can be right. I completely get your point about the residuals being normally distributed.

As for the other questions I posed, are there any thoughts on those? For example, the issue that the profiler us suggesting only extreme values as optimal settings for the combination of responses that we have. From our domain knowledge, this doesn't make sense. Is this an indication something has gone wrong with the DOE or some other systematic noise has been introduced into the data? Or, is this some kind of other indication that something else is not working properly.

Although I pose some specific questions to my case at hand, I also pose some very general questions that I would imagine others have also come across (or might come across in the future), hence their general relevance to the community. Many of the questions are generally about how to manage and handle the data when the analysis is not an ideal situation. Most, if not all of the example data provided with JMP tend to show ideal situations, but reality tends to be messier and needs more of a nuanced approach.

I'm still hopeful to have continued discussions, especially about the more general questions.

Thanks,

DS

Dan_Obermiller · Jun 9, 2025 09:50 AM

Let me try to clarify my comment. Saying the distribution of the response is irrelevant is a bit too strong, really. But let me paint a picture.

Suppose I am measuring the blood pressure of some patients. All of these patients have high blood pressure. After taking the measurement, I give them all a very powerful blood pressure medication. After a month, I measure their blood pressure again. Would you expect the distribution of all blood pressures to be normally distributed? The answer is no. The medication would (hopefully) make the distribution bimodal. The signal is in the raw data and needs to be removed before checking for normality. All of the statistical inferences are based on the errors or residuals being normally distributed.

Now as far as what other JMP colleagues were saying, I cannot comment. I was not there. However, if experience suggests that the variable has a certain distribution in a steady state, then you should use that certain distribution. That is commonly what is done when specifying an error distribution. Beyond that, I would need to have been part of that conversation to comment more.

Dan Obermiller

SDF1 · Jun 9, 2025 11:43 AM

Hi @Dan_Obermiller ,

I understand your scenario and your comments about the error distributions. As far as the responses of my DOE go, it's like I replied to @MRB3855 , I don't care what the distributions are from the standpoint of the response -- they could be bimodal, log-normal, SHASH, what have you. They are what they are for various (mostly known) reasons, and there is no indication a special cause variation took place during the DOE that would eliminate any of the data.

The problem is how to manage this during modeling/analysis as several different JMP Community and Help resources have conflicting recommendations. Some maintain it's critical to use the mixed model because it keeps the whole&random effects in the model, but also comes with it's own set of downsides (one is non-physical prediction values). On the other hand, GenReg can manage the non-normal responses and whole plot effects (and provide physically valid predictions), but it can't manage the random effects. The mixed model also results in normally distributed residuals with mean near 0, while GenReg does not. What's a good approach and why? (I'm not saying the "right" approach, but a good approach). I have also tried analyzing the data using the GLMM platform, which does an OK job for 3 of the 4 responses, but fails very badly with the one data set that is log-normal and bounded by values >=0.

As this is not an ideal case, where most of the JMP sample data tend to be, what are some best practices for managing the analysis when it's not ideal? The same goes for the other general questions I posed.

DS

Victor_G · Jun 10, 2025 9:03 AM

Hi @SDF1,

Tricky question and complex situation, but you're right, this is often the case, unlike "perfect" toy/learning datasets.

Having faced similar (and not ideal) situations, here is what I could recommend from a practical point of view :

Identify patterns and anomalies : Plot the data to check if you have strange patterns of the responses depending on your whole plot effect and if you can already identify relationships between your factors and responses (+ correlations between responses). You can use the platform Multivariate to identify correlations as well as potential outliers using the Outlier analysis available in this platform.
First model iteration :
- Proceed with the full assumed model. Make sure how to deal with the Whole plot effect : Random effect (influence on response variance) or Fixed effect (influence on mean response) ? This answer should be dictated by the "physical meaning" of this whole plot situation and what it represent. In both cases, fitting the full assumed model will help you to understand how much variability is captured through this effect (either importance on response mean or response variance depending on the type of effects).
  On your datatable and considering your whole plot effect as random, you have various influence of this random effect depending on the response : from insignificant (for Y2), to medium (around 19 and 23% of the total variance for Y3 and Y1) to high (more than 40% of the variance captured by whole plot effect for Y4). Depending on this influence, this can give you an idea about how much precision you can lose by dropping this random effect out of the model. I would try to keep a mixed model for Y4, Y3 and Y1 if possible, and no need for mixed model for Y2.
- At this stage, you can check the residuals from your model (and Actual by Predicted plot), to see if you have curved pattern in your residual that may indicates a non-adequate model (missing term ?) or that a transformation may be needed.
  Be careful about transformation vs. generalized model, as they do not handle the data in the same way : Difference between "least square" and "generelized linear method" in the fit model Transformation affect your data directly (so the mean and variance of the response), whereas generalized models use a link function to transform the mean into a linear function of the predictor variables and a variance function to allow for variance heterogeneity in the analysis rather than trying to transform it away. I wouldn't tranform the responses unless I have a strong indication that it may be needed (simplification of the model with transformed responses and better residual patterns by transforming the response, which seems to be the case for Y2).
- Finally, you can also check which effects are statistically and practically significant, to understand in other models iterations which effects are interesting to consider including. You can also compare the results with other platforms, like The Fit Two Level Screening Platform that can help identify active effects, or Fit Least Squares modeling with random effect.
Models exploration/comparison/selection : Use various platforms depending on the assumptions and previous results :
- For response with medium/high impact of whole plot (random) effect on the response (Y4, Y3 and Y1), you can use Mixed Model or Least Squares model with whole plot random effect. To deal with censored data, you can use Weights in the model : create a numeric continuous column where censored data has a low value (binary weights : 0,5 for example, or based on ratio of censored on non-censored data, or more complex weights trend : the farther from the threshold limit you are, the lowest weight/importance your observation will have on model fitting) and non-censored data has a high value (binary weights 2 for example, or based on ratio of non-censored data on censored data, or more complex as described above). This will bias the model fitting and put more importance on data that is non-censored, and give little importance ("weight") to censored data (since it can't be exactly measured, so it is not critical to have good fit and low residuals in this censored area). You can then check the "Actual by Predicted" plot, see if the residuals seem acceptable for the non-censored measurement part, as well as checking residuals and verifying regression assumption for the non-censored data part.
  Example with the response Y_3 with the weight formula using the inverse of the squared distance to the detection limit of 10:
```
If( :Y__3 > 10,
	1,
	1 / ((10 - :Y__3) ^ 2)
)
```
  And the modeling results (with the whole plot random effect):
- For response with low impact of whole plot (random) effect on the response (Y2), you can try different models with Fit Least Squares, GenReg for example. You can then compare effects included in the models, and compare models based on various criteria : statistical significance, RMSE, R²/R²-adjusted, AICc/BIC, ...
Validation & Augmentation : Validate your models using validation runs. It may seem strange that extreme settings are suggested for the factors, but since your DoE mostly use only 2-levels factors, there is not much curvature (besides those brought by interaction effects) that could be present in your responses.
Augmenting your DoE to include 2nd order (or more) polynomial terms may be a good idea (instead of centre points that won't be helpful to estimate polynomial terms possibly contributing to curvature), to be able to draw a design space satisfying your 4 responses.

Some other considerations :

Depending on when Y2, Y3 and Y4 are measured vs. Y1 (primary response of interest) and the relevance of this idea, you could maybe add these intermediate responses as predictors for Y1. The process could be then to predict Y2, Y3 and Y4 based on factors, and then predict Y1 based on other predicted responses + factors. There is a risk to inflate prediction error with this intermediate step, but it may be worth a try. You could also model Y1 thanks to Y2, Y3 and Y4 only, and using PLS or other modeling platforms able to deal with collinearity.
You could also try Machine Learning approaches, as they may be good comparative model to estimate factors importance, compared to "traditional" statistical modeling. Also, depending on how you would like the prediction profile to be, the "step-based" prediction profile of tree-based models may be interesting for censored data, to not predict precisely below a certain threshold for example.
Example here for Y4 with a simple partition tree with only 2 splits, you can see that X5 and X6 have the highest influence on Y4, and you can separate censored data from the rest of the data (with the second split at X6 < 65) :

Even if this model may not be highly predictive or "practically convenient", this kind of analysis is helpful to compare with other models, and try to identify important predictors.

And at the end, a model isn't meant to be perfect, it's meant to be useful. There might be compromise in the modeling platforms chosen, but you should be able to justify your choices based on validation data and domain expertise.

Hope this (long) answer might provide some useful ideas,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Jun 10, 2025 12:22 PM

Victor I agree completely with your first step, but, I would like to propose an alternate to your step #2. You should not fit a full model as there are 2 different error structures (whole plot and sub plot). This would make for inappropriate comparisons. For example, comparing the WP factor(s) to the MSE of the subplot for statistical significance is comparing apples to oranges. That p-value is useless. You should essentially treat the whole plot and sub plot as if they are 2 different experiments. Please read Box and Jones. Also Anderson and McLean, Sanders, Bisgaard.

Box, G.E.P., Stephen Jones (1992), “Split-plot designs for robust product experimentation”, Journal of Applied Statistics, Vol. 19, No. 1

Jones, Bradley, Christopher J. Nachtsheim (2009) “Split-Plot Designs: What, Why, and How”, Journal of Quality Technology, Vol. 41, No. 4, pp. 340-361

Anderson, Virgil and McLean, Robert (1974) “Design of Experiments, A Realistic Approach” Marcell Decker (ISBN 0-8247-7493-0)

Sanders, D., & Coleman, J. (2003). Recognition and Importance of Restrictions on Randomization in Industrial Experimentation. Quality Engineering, 15(4), 533–543. https://doi.org/10.1081/QEN-120018386

Bisgaard, S. (2000). The Design and Analysis of 2^k–p × 2^q–r Split Plot Experiments. Journal of Quality Technology, 32(1), 39–56. https://doi.org/10.1080/00224065.2000.11979970

"All models are wrong, some are useful" G.E.P. Box

Discussions

Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Re: Split Plot DOE discussion time

Recommended Articles