Re: How to use the effect summary effectively for a mixture DOE?

AutoSetMarmoset · Jul 2, 2024 08:55 PM

Hello,

I ran a DOE on JMP 17, and I am beginning to parse the data with JMP using the effect summary. It is a four-component mixture. I've watched some videos in the JMP catalogue on using the effect summary for mixtures and been following what they've been doing (removing sources that had a P value >0.05, if it doesn't have any dependencies above it). Are there any circumstances, that I would deviate from that? I've attached a picture of the effect summary below - for example would there be any case I would remove source A? Just curious, since the actual by predicted plot (also attached) since the RSq value is a bit low.

Thanks!
Actual by Predicted PlotEffect summary

Victor_G · Jul 3, 2024 1:26 AM

Hi @AutoSetMarmoset,

Mixture designs are optimization strategies, where the emphasis of this type of design is more on predictivity rather than on screening and statistical significance filtering.

It makes more sense in the analysis to start from the full assumed model with the terms you have entered in the design creation, and start removing terms in the model (except main effects), based on the predictive performance of the model (RMSE for example), NOT based on individual p-values/logworth of each term : there are multicollinearity/correlations among mixture factors, no intercept in this type of model, so p-values/logworth are not a valid metric for model selection.

Note there are many ways to evaluate model "performances"/adequacy, depending on your objective(s) and the metrics you used. Complementary model's estimation and evaluation metrics, like log-likelihood, information criteria (AICc, BIC) or model's explanative power (through R2 and R2 adjusted), model's predictive power (through MSE/RMSE/RASE), offer different perspective and may highlight different models.
You can then select based on domain expertise and metrics evaluation which one(s) is/are the most appropriate/relevant for your topic, and choose to estimate individual predictions with the different models (to see how/where they differ), and/or to use a combined model to average out the prediction errors.

Here are some relevant posts in the forum dealing with model selection for (mixture) designs :

Backward regression for Mixture DOE analysis with regular (non pro) JMP?

Analysis of a Mixture DOE with stepwise regression

removing terms from a model following a designed experment

If you need a more detailed advice or guidance, could you share an anonimized version of your DoE ?

I hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

frankderuyck · Jul 3, 2024 06:00 AM

Analysis of mixture DOE is not easy; I experienced that R² may improve by removing the 3rd order term

AutoSetMarmoset · Jul 3, 2024 01:33 PM

Hello all,

Thanks for your replies - I've attached an anonymized version of the JMP sheet I have been working with. For some more background, here is how I have been trying to analyze the data. Between my four components I used a Scheffe cubic and removed some of the higher order terms and fit it separately. Then I look at the effect summary and remove effects that have >0.05 Pvalue given that they are not an effect with containing effects above them. Afterwards, I would look at the mixture profiler or prediction profiler to begin to understand what the tradeoffs are within the mixture space. I am relatively new to this and trying to learn best practices, so I appreciate the help.

Edit: Just want to add some additional context after reading some of other discussions - these are four core components of a formulation we work with, and I wanted to determine what would be the best space and boundaries that have both reasonable mixture quality score and rate of phase separation. We have some intuition on how some components may affect these qualities, but it was never explored systematically. I've been running the model using the standard least squares.

Thanks!

Victor_G · Jul 4, 2024 02:37 AM

Hi @AutoSetMarmoset,

Can you edit your message ? There is no attached file.

As a general advice you should use the predefined/assumed model of your design to analyze the data.

If necessary/possible, you can refine your model by removing some terms, but for mixture designs the use of p-values to refine the model is not recommended as they might be biased (for the reasons I have mentioned earlier).

You need to use a different metric to help you figure out and balance one possible "right" model complexity with an acceptable predictive precision.

Hope this answer may help you in the meantime,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

AutoSetMarmoset · Jul 5, 2024 01:23 PM

Hi Victor,

Attached the anonymized study again for your reference.

Thanks

Victor_G · Jul 7, 2024 1:57 AM

Hi @AutoSetMarmoset,

Looking at your response "Time until phase separation", your complete assumed model with main effects, 2-factors interactions and 3-factors interactions seems to do a pretty good job :

Quite low RMSE (around 20) compared to the ranges of the response (from 5 to 500),
No significant patterns in the residuals, except due to the very different responses values measured, you seem to have 3 blocks of values : around 5, around 100 and around 500. Do you have a detection limit at 500 or lower at 5 ? The analysis could be done by binning these 3 block of results. Fitting an ordinal logistic model on this trasnformed response leads to similar results compared to the continuous response : a ratio of 40/60 or 30/70 for C/D would lead to a decrease of the time until separation. Validation point(s) in this area could be interesting to validate this optimum found.
Concerning the response "Mixture quality", it seems to be ratings, so I would recommend switching the Modeling Type to "Ordinal" and create a logistic model. Since I don't know what is the ranking order (1 best or 1 worst?), you can use the script created for the analysis, change the desirabilities if needed, and see if you have to find a compromise between the two responses optima.

I saved the scripts I used to test the various options and models proposed.

I hope you'll find some options to analyze your responses,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Jul 7, 2024 9:52 AM

Here are some thoughts:

1. The response variables are challenging. The time response is unusual. There appears to be discrimination issues or the data is truncated (values 5 and 500, my guess is 500 is no separation)? The "quality score" lacks discrimination (your ordinal scales should have a minimum of 5 categories and use multiple assessors). It also does not correlate well with the time. This means you will likely have tradeoffs for the 2 responses.

2. Analysis of mixtures is not your typical DOE analysis. There is likely collinearity between the terms and interactions (which are actually non-linear effects). You can't use p-values as you normally would as model assumptions are violated, particularly independence. Also R^2 likely exaggerated. Not that big of a problem as you already know these components are significant. Mixture designs are biased to optimization (rather than screening).

3. Use the Mixture Profiler along with your SME. Make sure it makes sense. Then replicate the results...

Lastly, you might want to read up on mixtures. Cornell is the accepted standard.

Cornell, John (1990) “Experiments with Mixtures, Designs, Models, and the Analysis” Wiley (ISBN:047152221X)

"All models are wrong, some are useful" G.E.P. Box

AutoSetMarmoset · Jul 8, 2024 02:02 PM

Hello @Victor_G @statman

Appreciate the detailed responses! Both of you have brought up some issues I have with our testing, where discrimination is a bit difficult which adds some noise in the measurement. In terms of analysis, I am beginning to get some idea on what a good way of would be analyzing the data by using RMSE and not R^2 because of complex mixture interactions. One thing I am still unsure about is removing appropriate terms (and where to draw the line) to improve the prediction by the model. I want to be careful since I don't want to mislead others.

Appreciate the suggestion on the textbook! This is something I have been looking for and will start reading more about this.

Thanks

statman · Jul 3, 2024 09:56 AM

You should be using Mixture Profiler to evaluate mixture design.

"All models are wrong, some are useful" G.E.P. Box

How to use the effect summary effectively for a mixture DOE?