Discussions

AutoSetMarmoset · Jun 5, 2024 07:47 PM

Hello everyone,

I'm relatively new to analyzing mixture formulations across multiple response variables and could use some guidance. Currently, I'm working on a new dataset that incorporates historical data from various formulations within our company. My aim is to leverage JMP for analyzing how different weight percentages of ingredients impact these response variables.

Attached is an example dataset, and I'm seeking advice on the best approach for processing this data. Historically, we've evaluated 24 different ingredients and are keen on maximizing desirability in our formulations.

Essentially, I'm looking to identify trends between the weight percentages of specific ingredients and the corresponding response variables. Additionally, I'm interested in receiving recommendations on 'ideal' formulations within certain bounds. It would be immensely helpful if this could also suggest a Design of Experiments (DOE) approach to test using our existing dataset.

Any insights or suggestions would be greatly appreciated.

Thank you in advance!

Dan_Obermiller · Jun 6, 2024 7:22 AM

Analyzing historical mixture data has a few pitfalls. Many times people will perform a designed experiment rather than trying to utilize historical data. Historical data was collected under varying conditions (which likely were not recorded -- think equipment changes as one example) with different purposes in mind which may make the combining of that data non-sensical. Even the time differences of when the data was collected could cause problems with combining the data. However, if you want to press on, here are some additional thoughts/concerns.

You must make sure that your ingredient columns add to EXACTLY 1. Round-off error is not going to be acceptable. For example, in your example dataset, row number 2 adds up to 1.000000001. When you fit the mixture model, you definitely should look at the Analysis of Variance report (you will have to request it as it is not shown by default). You should see a message at the bottom of that table that says "Tested against reduced model Y=mean". If that message is there, you are fitting a mixture model. But if that message says "Tested against reduced model Y=0", then you are NOT fitting a mixture model. You are fitting a no-intercept model which is not the same thing. This last message is usually caused by the mixture ingredients not always adding to 1.

From your example dataset, is there a difference between a missing value for an ingredient and a level of 0? Conceptually, I believe you could replace all of your missing values with 0's. It may be helpful to do that because a missing value for an input variable could lead to that observation being excluded from the analysis.

This also is true for responses. If a response value is missing, that will definitely cause problems for the analysis. You must have a response in order to include that row in the analysis. Therefore, you will need to make sure that you analyze each response variable separately in order to utilize as much data as possible for each model.

If your example data is realistic, you have MANY ingredients. This will naturally lead to a VERY large model. Since you are looking for optimization, you will want a model with the 2-way blending terms (they look like interactions) and possibly the special cubic terms (they look like 3-way interactions) at a minimum. You will need many observations to fit that kind of a model. Just using 2-way terms, you will have 24 main effects + 276 2-way terms. That is 300 model terms, so at least 300 observations. Further, that large of a model will be very hard to interpret and visualize. Again, a designed experiment would allow you to focus in on key questions which would allow you to work with fewer ingredients to answer your specific questions.

Before building any model, you should turn on the pseudo-component coding for the mixture ingredients in order to minimize any correlations among the model terms. Although it will not remove the correlation, it will minimize it and this will help the analysis.

You should also turn on the Mixture column property, which it looks like you did for your example.

If you have not done so yet, do some research on how the fitting of a Scheffe mixture model is different from regular regression. Interpretations on a mixture model will be different.

Optimization of multiple responses can certainly be accomplished once you have a model for each response. You can use the Desirability Function on the Prediction Profiler, but I would also strongly suggest using the Mixture Profiler as well. They will behave in a similar fashion to a non-mixture situation.

Finally, you could utilize your example table as an input to creating a designed experiment. Use Augment Design and specify your existing rows as the "completed experiments". Specify the model you wish to fit and the number of total runs that you would like to have. JMP will choose the best runs to use to estimate the model that you chose. Keep in mind that the total number of runs will be the initial table + the new runs and JMP does not use any methodology on the suggested number of runs, so please use your knowledge to pick a reasonable number and be sure to evaluate the resulting design before running it.

So just a few thoughts off the top of my head. What you want to do can be done, but much caution needs to be used with historical data and lots of time needs to be spent conditioning the data to ensure that the analysis results are going to be reliable.

Dan Obermiller

View solution in original post

Dan_Obermiller · Jun 6, 2024 7:22 AM