cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

How to analyze historical mixture data and maximize desirability for next formulations?

Hello everyone,

 

I'm relatively new to analyzing mixture formulations across multiple response variables and could use some guidance. Currently, I'm working on a new dataset that incorporates historical data from various formulations within our company. My aim is to leverage JMP for analyzing how different weight percentages of ingredients impact these response variables.

Attached is an example dataset, and I'm seeking advice on the best approach for processing this data. Historically, we've evaluated 24 different ingredients and are keen on maximizing desirability in our formulations.

 

Essentially, I'm looking to identify trends between the weight percentages of specific ingredients and the corresponding response variables. Additionally, I'm interested in receiving recommendations on 'ideal' formulations within certain bounds. It would be immensely helpful if this could also suggest a Design of Experiments (DOE) approach to test using our existing dataset.

 

Any insights or suggestions would be greatly appreciated.

Thank you in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to analyze historical mixture data and maximize desirability for next formulations?

Analyzing historical mixture data has a few pitfalls. Many times people will perform a designed experiment rather than trying to utilize historical data. Historical data was collected under varying conditions (which likely were not recorded -- think equipment changes as one example) with different purposes in mind which may make the combining of that data non-sensical. Even the time differences of when the data was collected could cause problems with combining the data. However, if you want to press on, here are some additional thoughts/concerns.

 

You must make sure that your ingredient columns add to EXACTLY 1. Round-off error is not going to be acceptable. For example, in your example dataset, row number 2 adds up to 1.000000001. When you fit the mixture model, you definitely should look at the Analysis of Variance report (you will have to request it as it is not shown by default). You should see a message at the bottom of that table that says "Tested against reduced model Y=mean".  If that message is there, you are fitting a mixture model. But if that message says "Tested against reduced model Y=0", then you are NOT fitting a mixture model. You are fitting a no-intercept model which is not the same thing. This last message is usually caused by the mixture ingredients not always adding to 1.

 

From your example dataset, is there a difference between a missing value for an ingredient and a level of 0? Conceptually, I believe you could replace all of your missing values with 0's. It may be helpful to do that because a missing value for an input variable could lead to that observation being excluded from the analysis.

 

This also is true for responses. If a response value is missing, that will definitely cause problems for the analysis. You must have a response in order to include that row in the analysis. Therefore, you will need to make sure that you analyze each response variable separately in order to utilize as much data as possible for each model.

 

If your example data is realistic, you have MANY ingredients. This will naturally lead to a VERY large model. Since you are looking for optimization, you will want a model with the 2-way blending terms (they look like interactions) and possibly the special cubic terms (they look like 3-way interactions) at a minimum. You will need many observations to fit that kind of a model. Just using 2-way terms, you will have 24 main effects + 276 2-way terms. That is 300 model terms, so at least 300 observations. Further, that large of a model will be very hard to interpret and visualize. Again, a designed experiment would allow you to focus in on key questions which would allow you to work with fewer ingredients to answer your specific questions.

 

Before building any model, you should turn on the pseudo-component coding for the mixture ingredients in order to minimize any correlations among the model terms. Although it will not remove the correlation, it will minimize it and this will help the analysis.

 

You should also turn on the Mixture column property, which it looks like you did for your example. 

 

If you have not done so yet, do some research on how the fitting of a Scheffe mixture model is different from regular regression. Interpretations on a mixture model will be different.

 

Optimization of multiple responses can certainly be accomplished once you have a model for each response. You can use the Desirability Function on the Prediction Profiler, but I would also strongly suggest using the Mixture Profiler as well. They will behave in a similar fashion to a non-mixture situation. 

 

Finally, you could utilize your example table as an input to creating a designed experiment. Use Augment Design and specify your existing rows as the "completed experiments". Specify the model you wish to fit and the number of total runs that you would like to have. JMP will choose the best runs to use to estimate the model that you chose. Keep in mind that the total number of runs will be the initial table + the new runs and JMP does not use any methodology on the suggested number of runs, so please use your knowledge to pick a reasonable number and be sure to evaluate the resulting design before running it.

 

So just a few thoughts off the top of my head. What you want to do can be done, but much caution needs to be used with historical data and lots of time needs to be spent conditioning the data to ensure that the analysis results are going to be reliable.

Dan Obermiller

View solution in original post

3 REPLIES 3

Re: How to analyze historical mixture data and maximize desirability for next formulations?

Analyzing historical mixture data has a few pitfalls. Many times people will perform a designed experiment rather than trying to utilize historical data. Historical data was collected under varying conditions (which likely were not recorded -- think equipment changes as one example) with different purposes in mind which may make the combining of that data non-sensical. Even the time differences of when the data was collected could cause problems with combining the data. However, if you want to press on, here are some additional thoughts/concerns.

 

You must make sure that your ingredient columns add to EXACTLY 1. Round-off error is not going to be acceptable. For example, in your example dataset, row number 2 adds up to 1.000000001. When you fit the mixture model, you definitely should look at the Analysis of Variance report (you will have to request it as it is not shown by default). You should see a message at the bottom of that table that says "Tested against reduced model Y=mean".  If that message is there, you are fitting a mixture model. But if that message says "Tested against reduced model Y=0", then you are NOT fitting a mixture model. You are fitting a no-intercept model which is not the same thing. This last message is usually caused by the mixture ingredients not always adding to 1.

 

From your example dataset, is there a difference between a missing value for an ingredient and a level of 0? Conceptually, I believe you could replace all of your missing values with 0's. It may be helpful to do that because a missing value for an input variable could lead to that observation being excluded from the analysis.

 

This also is true for responses. If a response value is missing, that will definitely cause problems for the analysis. You must have a response in order to include that row in the analysis. Therefore, you will need to make sure that you analyze each response variable separately in order to utilize as much data as possible for each model.

 

If your example data is realistic, you have MANY ingredients. This will naturally lead to a VERY large model. Since you are looking for optimization, you will want a model with the 2-way blending terms (they look like interactions) and possibly the special cubic terms (they look like 3-way interactions) at a minimum. You will need many observations to fit that kind of a model. Just using 2-way terms, you will have 24 main effects + 276 2-way terms. That is 300 model terms, so at least 300 observations. Further, that large of a model will be very hard to interpret and visualize. Again, a designed experiment would allow you to focus in on key questions which would allow you to work with fewer ingredients to answer your specific questions.

 

Before building any model, you should turn on the pseudo-component coding for the mixture ingredients in order to minimize any correlations among the model terms. Although it will not remove the correlation, it will minimize it and this will help the analysis.

 

You should also turn on the Mixture column property, which it looks like you did for your example. 

 

If you have not done so yet, do some research on how the fitting of a Scheffe mixture model is different from regular regression. Interpretations on a mixture model will be different.

 

Optimization of multiple responses can certainly be accomplished once you have a model for each response. You can use the Desirability Function on the Prediction Profiler, but I would also strongly suggest using the Mixture Profiler as well. They will behave in a similar fashion to a non-mixture situation. 

 

Finally, you could utilize your example table as an input to creating a designed experiment. Use Augment Design and specify your existing rows as the "completed experiments". Specify the model you wish to fit and the number of total runs that you would like to have. JMP will choose the best runs to use to estimate the model that you chose. Keep in mind that the total number of runs will be the initial table + the new runs and JMP does not use any methodology on the suggested number of runs, so please use your knowledge to pick a reasonable number and be sure to evaluate the resulting design before running it.

 

So just a few thoughts off the top of my head. What you want to do can be done, but much caution needs to be used with historical data and lots of time needs to be spent conditioning the data to ensure that the analysis results are going to be reliable.

Dan Obermiller

Re: How to analyze historical mixture data and maximize desirability for next formulations?

Hi Dan,

 

Appreciate your detailed reply, I agree about being careful about using historical data. These would not necessarily be runs that I have done myself but would have to trust that the data was done under similar conditions. As for making sure the ingredients round up to exactly 1, is there a way so that JMP automatically interprets the sum as 1? This solves a separate question I have about using parts instead of wt% without doing any additional calculations. 

 

In the example dataset, there isn't any difference between missing values and a level of 0, is there a quick way to fill that out without manually doing it? I see what the issue with missing values are and will fill it in appropriately - not all response variables were measured for each run, so would that cause any issue with the analysis besides not being included?

 

I'll work through your suggestions and steps and see where that leads me. Besides doing some research on Scheffe mixture modeling vs regular regression, is there any documentation or similar examples of what I want to do? Or any other resources to help me deepen my understanding would be appreciated.

 

Thanks for your help! 

Re: How to analyze historical mixture data and maximize desirability for next formulations?

Easiest way to make sure that the components add to one is to make one of the components a formula that is (1 - all of the rest). In your situation, it would be important to make it a component that is always in the blend.

 

To replace the missing values with 0's, I would select all of the factor columns (this way the missing values for the responses will not be changed to 0 -- you don't want that).  Go to Edit > Search > Find.  Fill in the box on Find what with a "." (no quotes). In the Replace With field, type in 0. Then be sure to select the "Restrict to Selected Columns" box and click "Replace All". 

 

As for missing responses not being included, it MIGHT be an issue only that you have a reduced sample size. Therefore, you may not have an adequate number of trials for the model you wish to fit. Or, you might not even if there are enough trials, they may not be in the proper locations to build a useful model.

 

There are many free resources on mixture modeling on the JMP website. If you wish to look at textbooks, the book Experiments with Mixtures by John Cornell is still considered the best. 

 

 

Dan Obermiller