Subscribe Bookmark


Nov 1, 2013

Potato chip smackdown: A generalized regression analysis

Four rows of three potato chips

What if we analyzed the data using Generalized
Regression? Would the results be any different from those we got using Choice Analysis?

At this year’s JMP Discovery Summit in San Diego, there were plenty of fantastic talks. After completing the tasting portion of the potato chip smackdown, one that stuck out for me was my colleague Clay Barker’s presentation on using the Generalized Regression personality in the Fit Model platform in creative ways. In particular, he showed how to fit a Bradley-Terry model to compare the relative strengths of basketball teams by looking at the win-loss performance for each pair of teams. After seeing Melinda’s analysis using the Choice Analysis platform, I thought it would be interesting to try the Generalized Regression platform and see how the results compare.

Data Preparation

To try this, I needed to set up a new data table where each pair that was compared together in a choice set appears in a row, one with a value of 1, and the other a -1. Then I needed columns for the number of times each pair was compared, and the number of times the flavor with the value of 1 was chosen over that with a -1. Note that I didn’t need an additional row with the values of +/-1 reversed.

If you read the experimental setup, you may recall that we didn’t test the chips in pairs, but three at a time. To do this analysis, I split up each choice set into the three pairs that occur within it, since I can determine the winner and loser from each based on having the best and the worst from each choice set.

I found it easiest to set it up the same way that Clay did with the basketball data – set values of 1 for the first flavor, then put in values of -1 for one flavor at a time with the remaining flavors to the right, moving down the rows one at a time (for those of you who like to think of matrices, you get something that looks like a negative identity matrix). Just to give you an idea of what this looks like, the first few rows of my data table look like this:


You can take a look at the full data table on the File Exchange. I didn’t create a nice script to collect the data in the format I wanted – it was a matter of using table summaries and manually inputting the times chosen and compared for each pair.

The Analysis

Now that I have the data table I wanted, it’s off to Fit Model. The times chosen and times compared are selected for the Y variables, and I add all the chip flavors as effects with No Intercept selected (based on the Bradley-Terry model). With a Generalized Regression and binomial distribution (we’re looking at the probability that one flavor gets picked over the other), I had:


I chose forward selection with AICc validation, although BIC gives the same results. The chosen model looks like this:


Or visually:


Using the pairwise Bradley-Terry model, BBQ and Southern BBQ come out on top, followed by Southern Biscuits and Gravy and Truffle Fries (note that these positions switched from the Choice model, but the estimates are close). For the remaining flavors, everything else was grouped together, except for poor Greektown Gyro, which wasn’t very popular among the JMP group.

Final Thoughts

It wasn’t all that surprising to me that the results came out similar to the choice modeling Melinda used, but I thought it was interesting to use the generalized regression approach to see which flavors separate themselves from the pack. Alas, the Canadian flavors didn’t hold up, but at least they weren’t the worst.

I think there’s still more to the story: The Bradley-Terry model can be extended to three comparisons at a time, and there may be more to the story with choice modeling. This is likely to be revisited in a future blog post -- or perhaps a future Discovery Summit. Thanks for reading!

Community Member

Rick Wicklin wrote:

Nice presentation. If you want to see how to fitt a Bradley-Terry model in SAS/STAT software, see "Sample 24992: Fitting the Bradley-Terry model to preference data from items presented in pairs".

Ryan Lekivetz wrote:

Thanks for the link for how to do it in SAS/STAT.