Mmm cookies: a tale of discrete numeric variables, disallowed combinations and alias optimality
How do individual ingredients affect the taste ratings of cookies? Design of experiments can help find out. (Photo courtesy of Trish O'Grady)
When I was in graduate school, one of my hobbies was to bake cookies for the department. For one of the basic chocolate chip cookie recipes, it wasn’t uncommon to switch the chocolate chips with another ingredient that was on sale that week (I was a grad student, after all). That also meant I had enough volunteers to give ratings on cookies to give me a reasonable response value on a batch of cookies. What if I wanted to find out how each of these ingredients individually affects the taste of the cookie?
For this example, I’m looking at walnuts, raisins, chocolate chips, pecans, coconut, toffee and brownie chunks. I can afford to make 14 batches of cookies, so a simple approach would be to make two batches with each ingredient. However, it’s going to be very difficult to pick up the differences between ingredients unless I have very little variation between batches, which is easier said than done. This sounds like a good opportunity to use design of experiments, and specifically Custom Design in JMP, so that I can use multiple ingredients in a batch! However, here are a couple of things I need to consider:
If I’m using multiple ingredients per batch, it’s very likely there are active two-factor interactions that I’m not interested in estimating.
The structure of the cookie is going to break down with too many added ingredients, so I decide not to bake cookies with more than four ingredients in a batch.
To handle the first issue, I can use an Alias Optimal design, but the second issue is a bit trickier.
Ideally, I would treat each ingredient as a two-level categorical factor, with levels indicating presence or absence. However, restricting each batch to have no more than four ingredients would be difficult. Another idea would be to use continuous variables from 0 to 1, and use a linear constraint that the sum should be less than or equal to four. This yields a design that looks good in terms of alias optimality, but the linear constraint makes it tough for the coordinate-exchange algorithm to find whole numbers for the ingredients, and I end up with something that has a lot of decimals that I don’t want to deal with. If only there was a way I could use a continuous variable that wasn’t allowed to be a fraction…
Discrete Numeric Variables
Why don’t I treat them as discrete numeric variables? This way I’m still dealing with a continuous variable, but I’m restricting the number of possible values. I open up a new Custom Design and enter my discrete numeric variables, as shown in the figure below.
After clicking the “Continue” button, I can select Alias Optimality from the red triangle at the top of my open Custom Design:
I’m almost ready to create the design – I just need to set up my linear constraint... only to realize that I’m not able to use the linear constraint interface with discrete numeric variables. Now what?
I previously blogged about disallowed combinations in the context of map shapes and space-filling designs. For this example, I want to use the linear constraint that ensures I use no more than four ingredients – it just needs to be switched to a disallowed combination. That is, we want to disallow whenever the sum of the ingredients is greater than four. Heading back up the red triangle and selecting “Disallowed Combinations,” I tell the Custom Designer to not use any run where the sum exceeds four:
My Alias Optimal Design
I can now set the run size to 14, and click the “Make Design” button. I get a design that looks like this:
A quick look verifies that each batch of cookies has either three or four ingredients. But now for the moment of truth – how did we do in terms of alias optimality? Looking at the Color Map on Correlations reveals that the main effects are orthogonal to the two-factor interactions – this means that the Alias Matrix has all zeroes except for the intercept.
As I mentioned in my previous blog post, using an Alias Optimal design involves a trade-off – there’s a loss of estimation efficiency vs. the D-optimal design. However, the Alias Optimal design gives me worry-free estimation of the main effects even in the presence of two-factor interactions.
To get all the main effects unaliased by any two-factor interaction, you need your design to consist of pairs of runs that are mirror images of each other (that is, each 0 in one row has a 1 in the corresponding column of its paired row and vice versa). This implies that you need an even number of runs in your design. So, it was fortunate that I could afford to do 14 runs!