Insights from Transforming Factorial Variables into Mixture Variables (2025-US-30MP-2524)

This presentation examines an initially routine DOE aimed at elucidating the effects of various resin synthesis variables on the properties of a coating. Preliminary data analysis uncovered several promising correlations between the synthesis parameters and the resulting coating characteristics. To enhance our understanding, we developed a theoretical framework to predict the individual components of the resin based on the underlying reaction mechanisms and straightforward probability calculations. This innovative approach enabled us to reinterpret the original DOE variables as mixture variables, facilitating a more nuanced analysis.

By reanalyzing the data within the context of these mixture variables, we uncovered new insights into how resin composition affects coating performance, leading to novel and counterintuitive ideas for resin enhancements.

We demonstrate the application of JMP software throughout this study, highlighting several analytical tools, including the DOE platform, prediction profiler, ternary plot, scatter plot matrix, and mixture profiler.

I'm David Fenn. I'm a chemist. I work for PPG at the Coatings Innovation Center just outside Pittsburgh in Pennsylvania. Today I'm going to talk about what started as a pretty straightforward factorial DOE. But in process, I realized that it was possible to transform some of the original factorial variables into mixture variables and gain some additional insights. I just wanted to go through that today.

In terms of agenda, I'll give a little bit of background on PPG Industries. Then we'll cover some basics for mixture experiments, and then we'll spend the bulk of the time talking about the DOE that we ran and finish with a very brief summary.

PPG is a global coatings company. We're a pretty big organization. We had a revenue last year of about $15.8 billion, about 46,000 employees, and we've been around for 140 years or more. But we're perhaps not that much of a household name. Primarily, that's because most of our sales are business-to-business. But it's probably safe to say that most of us have interacted with PPG coatings many times during our daily lives.

This slide here gives you just a brief indication of some of the areas where our coatings are used. That ranges from anything from large structures like well-known landmarks, sports stadiums, down to small devices, consumer electronics. A lot of coatings used in cars, automobiles, and a lot of examples in our homes and our offices, things like furniture, appliances, flooring, all use PPG coatings.

Very prevalent in the world that we live in today. The obvious role of coatings is to transform the way an object looks, to give it some color or to give it a smooth, glossy appearance. But more often than not, our coatings have to provide a number of very important technical functions as well.

Just to give you a couple of examples, when we're coating cars, not only do we provide the color and the appearance of the car, which is very important in terms of the sales process, but our coatings also have to provide things like UV resistance, acid rain resistance, stone chip resistance, and corrosion resistance. We end up applying a stack of coatings onto cars that's less than the width of a human hair. But without those coatings, cars would turn to rust very, very rapidly.

Then I'll give you one more example. We provide coatings for the cans that are used to package beverages and food. We provide coatings that go on the interior of cans, but also the exterior of those cans. The interior coating, its job is twofold. One is to protect the contents of the can from the metal that the can is made of, so that the contents don't pick up a metallic taste from the aluminum of the can, but also to protect the aluminum from what can sometimes be quite a corrosive content, so to stop the corrosion happening.

But we also apply coatings to the outside of cans, and they give us an opportunity to provide some branding on the outside of the cans. But again, they have a very important technical function. They have anti-friction properties. Without those anti-friction coatings, if we loaded crates of soda cans onto a truck and drove it across the US, by the time we got to the destination, those cans would be full of holes because of the friction between the cans. Our coatings have a very important property to bestow on that as well.

With so many technical requirements that are required from our coatings, as you can imagine, at the Coatings Innovation Center, we do a lot of DOEs, and we use JMP very heavily to help move our projects forward and to develop new coatings technology.

A lot of the DOEs we do are around formulations, so we do make quite good use of mixture experiments. I wanted to talk a little bit about the difference between a mixture experiment and the perhaps more common factorial type DOE. The difference really is that when we're dealing with a mixture experiment, the levels of the factors have to sum up to a constant. Very often, that will be 100%, but it doesn't have to be.

Under those circumstances, it's the relative proportions of those mixture components that are of interest. It's not the absolute levels. Because of that requirement to sum up to a constant, the factors are not independent. If we have a three-component mixture, for example, that sums up to 100%, by the time we've defined the first two of those mixture components, the third one is automatically defined as well because it's just the remainder of the 100% after we've set the levels of those first two.

That covariance has some important implications in the design analysis, and interpretation of mixture experiments. But with a bit of knowledge, there are some very useful tools in JMP that allow us to do the design and analysis, and it's relatively straightforward. Doing mixture experiments, once you know the basics, is something that's pretty easy to do.

I've done a lot of mixture experiments, and I thought I was pretty well-versed in how to do them and when to do them. But this particular DOE that we did that, as I said, started as a factorial DOE, really opened my eyes to perhaps sometimes I've been doing mixture experiments without even realizing it.

We're going to the experiment that we did. We'll talk a little bit about the DOE that we ran. This slide here is giving us just an overview of the system we were dealing with here. We're trying to design a resin for a protective coating, and this is a multistep process.

In the first step, we're taking a polyester. That's just a long-chain polymer molecule that has hydroxyl groups at either end of that polyester. Then we're reacting some of those hydroxyl groups with a modifier, and that will cause some of those hydroxyl groups to change into a different functional group. Depending on the ratio of modifier to hydroxyl groups, we'll transform some fraction of those groups into a different functional group.

Once we've done that, we add some additional unmodified polyester to that reaction product, and then we do a second polymerization reaction. The purpose of that is to allow the polymer to be dispersed into water. We then disperse it, we formulate it into a coating by adding a lot of other components. Then we spray, apply that coating onto a substrate, bake it at high temperature to cure it, and then we test the properties.

That's the multistep process that we're looking at here. Because of confidentiality, I can't go into too much detail about exactly what's happening in these stages. But the important thing is, all of these later steps in the process, we're keeping constant. There's no variation going on there. The only factors we are varying are in this first stage of the process.

We identified three factors that we wanted to look at in this DOE. We wanted to study the molecular weight of the polyester. How long is this polymer chain between the hydroxyl groups? We wanted to look at what percentage of the polyester we put in this very first stage. What fraction of the polyester comes in here, and what fraction do we add in this second stage? Then finally, we wanted to look at the ratio of how many modifier groups do we have to how many hydroxyl groups do we have.

You can see we've got some ranges defined that we want to study in this particular DOE. We're trying to answer three questions here. One is, which of these factors have the largest effect on our coatings properties? Then, for the factors that do have an effect, what changes are they making to the polymer structure that are responsible for the property changes we're seeing?

We want to try and understand what chemistry is going on here, and how is it improving? Then the final question is, once we've learned everything from this DOE, what should we investigate next to try and optimize our performance? Where should we go to work next? They're the three questions we're really focusing on here.

Let's go into JMP, and we'll look at the DOE. Here we have our three factors here. We have our polyester molecular weight. We have two levels here of molecular weight. We have our percentage of polyester in the very first stage, and we have three levels that we're studying here for this. Then, finally, the ratio of modifier to hydroxyl groups. Again, we've got three levels that we're studying there.

In reality, we measured lots and lots of properties for this coating. But for simplicity, for today, in the data sheet, I've only included one property here, and that's the defect rating. Defects can be anything from a crater in the coating, a bubble in the coating. It could be a small hole that goes all the way through to the substrate.

But as well as affecting the appearance of that coating, because it's supposed to be a protective coating, the defects could provide a path through the coating to the substrate to allow something corrosive to get to the substrate. It's very important here that we want to minimize the number of defects. This is a numerical rating for how many defects we're seeing. The ideal here is to drive this to as low as possible and ideally down to zero.

We can go ahead and start to analyze this data now. We're going to analyze and fit a model, and we'll add our defect rating as the Y, and we'll select our three factorial variables, and we'll go into macros and select factorial to degree. Now we've added the main effects, but also the interaction terms.

We can go ahead and run the model here. If we look at this effect summary, we can see that some of the factors have a nice low P value, but there are some pretty high P values here as well. The first thing we want to do is reduce this model and take out any factors that have high P values. We'll start with the interaction terms. I'll start to remove those.

Then we get down to the main effects, and we're still seeing high P values. We'll remove some of these as well. Where we end up is it seems like there's only one factor that really seems to be driving our defect resistance, the number of defects we're seeing, and that's our ratio of modifier to hydroxyl groups.

If we go down to our prediction profiler, we can see that driving our modifier to hydroxyl ratio up to higher levels seems to be the direction we want to go. That's reducing the number of defects that we're observing. But if we look in a bit more detail here. If we look at our actual by predicted plot, it seems like maybe there's a bit of curvature here that we're not accounting for.

We also see that in our residual by predicted plot. Ideally, we'd this as a random distribution of these points, and it looks like there's a bit of curvature going on here. There's various ways that we can take care of that. I tried several different approaches. I won't go through those all, but what actually seemed to have the biggest effect and the most satisfactory effect was to add a squared term for our modifier to hydroxyl ration.

I'll do that by just going into macros and adding a polynomial to degree. Now I've got my main effect modifier, and I've got a squared term here as well. If I run that model, I can now see… From my actual by predicted plot, I can see I seem to have got rid of that curvature problem. I also see in my residual by predicted, this looks a lot more satisfactory. That's taken care of that little bit of curvature that we had.

If I go back to my prediction profiler, the conclusion is essentially still the same, that we want to drive to a higher modifier to hydroxyl ratio. That's where we ended up by doing the analysis.

Nice in some ways, because we learned one thing, we learned where we need to go in terms of modified hydroxyl. But perhaps a little bit disappointing that only one factor seemed to be playing a role. It's a little bit difficult to progress from there and really try and understand chemically what's going on in the system and why is this improving the properties.

Having done that, I started to think in a bit more detail about the system that we were dealing with. If we look at a bit more detail at this very first stage here where we were reacting our polyester with a modifier, there's really only three things that could possibly happen in this stage.

If we take one of these polyester molecules, it could end up with a modifier at one end and an unreactive hydroxyl at the other end, or it could end up with the hydroxyls being modified at both ends of the polyester, or we could end up with some polyester coming through this process and not getting modified at all. They're the only possible three reaction products we could get from this reaction.

Then we add some additional polyester after we've done this, which will change the ratio of these three. But still, we only have three possible reaction products. It turns out it's actually pretty simple to calculate what the expected ratio is from any given set of reaction conditions. Once I know what levels I'm setting my original factorial factors, I'll know what ratio of modifier to hydroxyl group I'm starting with.

In this example, say our ratio was five hydroxyl groups to one modifier. What I can say from that is that if I select any hydroxyl group at random, the probability that that's going to react is 20%. I can take that a bit further. If I pick a polyester molecule at random, the probability that both of those hydroxyl groups are going to be modified is the probability that any one will modify, which is 20% squared. 0.2 times 0.2.

I can calculate that the probability that any polyester molecule is going to be modified at both ends is 4%. Similarly, I can calculate that the probability that only one is going to react is the probability that any one hydroxyl reacts, times the probability that any one hydroxyl doesn't react. Then I have to multiply that by two because there's two different arrangements that I could have for one reacted and one not reacted. That turns out to be 0.32. There's a 32% chance that any polyester molecule is going to be modified at one end.

Then finally, I can calculate the probability that none of the hydroxyls are going to react in a polyester molecule, which is the probability that any one doesn't react squared, so 0.8 times 0.8. It turns out in these circumstances, there's a 4% chance that both ends are going to be modified. There's a 32% chance that one is going to be modified, and there's 64% chance that none of them are going to be modified.

We're ending up with a mixture, and we know the content of that mixture from this DOE, even though we started with factorial factors, where in reality, we're making a mixture. What can we do with that information? To my JMP table, I added my calculated distribution of molecules with both hydroxyls modified, molecules with one, and molecules with no hydroxyl modified. For each run, I calculated what that distribution should be.

Now, what I can do is rather than building a model for defects based on my original factorial variables, I can build a model based on these now my distribution variables. Because this is a mixture, it has to sum to 100%, I have to treat these as mixture variables, and I now have to carry this out as a mixture rather than a factorial DOE.

The first thing I want to do is understand how these points are distributed in my experimental space. That's pretty straightforward to do. If I go into Graph and Ternary Plot and I select my three mixture variables and hit okay. I'll just make this a little bit larger. But I can see that… This is the experimental space that I'm working in, and I can see my points seem to be pretty nicely distributed within this space. That gives me a degree of confidence that it's going to be worthwhile to have a look and see what effect these mixture variables have on building a model.

The next thing I'll do now is go ahead and build a model. We're going to analyze and fit a model. I've already got my factors loaded here. I've got my three mixture variables and three blending terms here. I've got my defect rating as my Y. Because this is a mixture, I prefer to do a stepwise regression to build the model. We're going to stepwise and hit run.

There are various ways of building this model. Again, because it's a mixture, these main effects we have to leave locked into the model. The decision we need to make is which of these blending terms deserve to be in the reduced model. Again, I tried various different approaches, and they all gave pretty much the same answer. But for now, we'll work on minimizing AICc. We'll select that approach, forward direction, and we'll just hit Go.

Where we end up is that only one of these blending terms ends up in the model, and that's percent both hydroxyls by percent no hydroxyls. Once we've got that, we can go ahead and run the model. A couple of things I'd like to show you. One is, if we look at our actual by predicted, looks pretty reasonable. We've got a nice R² We've got an R² of 0.88 here. I think we had an R² of about 0.82 before for our model, so maybe a slightly better model.

Then, if we look at the profiler, now what we're seeing here is the effect of these three distributions, the distribution of these three polyester components on our defect rating.

Now we can see that if we want to minimize our defects, what we need to do is drive the amount of polyester that's got no modifier on it down as low as we can get. We want to simultaneously move our either modified at one end or modified at both ends. We want to move those higher.

That's nice because it's now telling us from a molecular point of view, what's going on as we're improving our defect resistance. I've added a desirability function here that's going to increase desirability as we go to lower defects. I could go here and maximize desirability. This is essentially what we expected to happen. We get the best performance when we have a low level of no hydroxyl modification and higher levels of one or both.

That's a nice way of visualizing what's going on. I can also quickly look at the mixture profiler. It's just another way of visualizing the same situation. Make this a little bit smaller, so it fits on the screen. What I'll do is I'll get rid of the contour to start with. Then I'll add a contour grid, and we'll go from 0-150 defect rating.

We'll go into increments of 25. Then, if I try and make this a bit bigger… We can see, looking at the contours here, we get to a lower defect as we travel in this general direction. That's consistent with what we were seeing with the profiler. We're going to lower no hydroxyl and higher either one hydroxyl or both hydroxyl modified as we go in this general direction. Just another way of visualizing the conclusion there.

One final thing that I wanted to show you here is I loaded… Into this data sheet, I loaded my predictive equation for defects based on my factorial experiment, and I loaded the predictive equation based on the mixture approach, and I want to compare those. I'll just do a fit Y by X here. I'll select the Y as my defect rating and my X as my two predictive models, because that'll give me a nice side by side comparison of the two models.

Then I'll add fit line to both of these. I can see a couple of things Here, if we look at our factorial model, there's only three predictive levels it's giving for defect resistance, whereas my mixture model seems to be predicting a wider range of different results.

I can also see, as we noticed earlier, the R² is slightly better for the mixture model. Some potential advantages there, but really the main advantage here was that it really helped me to visualize what was going on at a molecular level with this system.

In terms of learning, we learned, obviously, the lower levels of a modified polyester and higher levels of polyester help with defect resistance. We can now go ahead and think, how can we extend the design space outside the original DOE to get further improvements? We can start to think about why do those changes improve the defect resistance? How is that final resin different?

Then we can think about, are there other strategies? Now we know what type of molecule we want to make. Are there other strategies we could use to make that preferred structure? Can't go into too much detail again because of confidentiality here. But really understanding at a molecular level through those mixture variables, what was going on was what allowed us to think more deeply about some of these questions, and in fact, helped us to give some answers here and move the project forward.

One other thing, if we want to move forward using this distribution of different products, we don't have a direct way of going from our original factorial variables to the mixture variables. But that's pretty straightforward to resolve. If I go into this JMP sheet here. If I want to build a model that allows me to predict what my distribution is going to be based on my original factorial variables, that can allow me to move forward and use these mixture variables as a way to change the end result. That's very straightforward to do.

If we're going to analyze and fit model, I'll just show you one of these. But if I wanted to build a model for percent both hydroxyl modified, I'd add that as a Y and then select my three factorial variables. Again, do factorial to degree 2, and then just hit Run. Then I've got my P values here, so I can just remove anything that's got a high P value. Then we end up with this reduced model. Because it's theoretical, we expect this, but it's got a very nice R².

I can do that for all three of the mixture variables. Then I can add my prediction equation to the data sheet. Then, once I've done that, I can go into graph. Then, if I go into profiler and add my three prediction formulas. Now I've added a desirability function here with a goal of matching a particular target. Here we've got predicted levels of our three different reaction products, and our variables here are our stage one polyester percent and our modified hydroxyl. Some of our my original factorial variables.

If I wanted to achieve a particular target for distribution of these three mixture variables, I just set my match target for where I wanted that to be. I could move these around if I wanted a different target, but I'll leave them roughly where I set them. Now I could just go into optimization and maximize desirability. Then that gives us the levels of our original factorial variables, or give us any target mixture of our new mixture variables.

That's an easy way of transferring between those two different types of variables. Just in terms of general summary, what I've learned from this is that factorial DOEs sometimes produce mixtures. It's something I wasn't really thinking about before, but now I have. I see quite a few examples where that's the case.

If we're in a situation where that mixture composition can be either predicted theoretically or determined analytically, we have the option to build models based both on the original factorial variables, but also on the mixture variables. Sometimes, as hopefully I've shown here, considering the responses in terms of the mixture factors can give a bit more of a fundamental understanding about what's going in the system, and it can help us to potentially identify new ways of achieving the goal or some alternative solutions.

Presenter

David Fenn

Skill level

Intermediate

Beginner
Intermediate
Advanced

Austin, TX October 21-24

Insights from Transforming Factorial Variables into Mixture Variables (2025-US-30MP-2524)

Presenter

Skill level

Files

Data Exploration and Visualization

Design of Experiments