Potato chip smackdown: A generalized regression analysis
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
To try this, I needed to set up a new data table where each pair that was compared together in a choice set appears in a row, one with a value of 1, and the other a -1. Then I needed columns for the number of times each pair was compared, and the number of times the flavor with the value of 1 was chosen over that with a -1. Note that I didn’t need an additional row with the values of +/-1 reversed.
If you read the experimental setup, you may recall that we didn’t test the chips in pairs, but three at a time. To do this analysis, I split up each choice set into the three pairs that occur within it, since I can determine the winner and loser from each based on having the best and the worst from each choice set.
I found it easiest to set it up the same way that Clay did with the basketball data – set values of 1 for the first flavor, then put in values of -1 for one flavor at a time with the remaining flavors to the right, moving down the rows one at a time (for those of you who like to think of matrices, you get something that looks like a negative identity matrix). Just to give you an idea of what this looks like, the first few rows of my data table look like this:
You can take a look at the full data table on the File Exchange. I didn’t create a nice script to collect the data in the format I wanted – it was a matter of using table summaries and manually inputting the times chosen and compared for each pair.
Now that I have the data table I wanted, it’s off to Fit Model. The times chosen and times compared are selected for the Y variables, and I add all the chip flavors as effects with No Intercept selected (based on the Bradley-Terry model). With a Generalized Regression and binomial distribution (we’re looking at the probability that one flavor gets picked over the other), I had:
I chose forward selection with AICc validation, although BIC gives the same results. The chosen model looks like this:
Using the pairwise Bradley-Terry model, BBQ and Southern BBQ come out on top, followed by Southern Biscuits and Gravy and Truffle Fries (note that these positions switched from the Choice model, but the estimates are close). For the remaining flavors, everything else was grouped together, except for poor Greektown Gyro, which wasn’t very popular among the JMP group.
It wasn’t all that surprising to me that the results came out similar to the choice modeling Melinda used, but I thought it was interesting to use the generalized regression approach to see which flavors separate themselves from the pack. Alas, the Canadian flavors didn’t hold up, but at least they weren’t the worst.
I think there’s still more to the story: The Bradley-Terry model can be extended to three comparisons at a time, and there may be more to the story with choice modeling. This is likely to be revisited in a future blog post -- or perhaps a future Discovery Summit. Thanks for reading!