Subscribe Bookmark


Jul 26, 2013

Ruining a perfectly good family gathering with mixed models

Over this past holiday season at a gathering, my family and I thought it might be fun to do a wine tasting. Some members wanted to just open different bottles of wines and talk about ones that we liked while catching up and having a relaxing Saturday. But I suggested that this would be a great opportunity to conduct a more formal experiment. Somewhat begrudgingly, my family agreed.

Next, I determined some methods for data collection that would give me the most reliable measure of wine quality with the lowest possibility for noise in the measurement.

The basic idea is to run a repeated measures experiment. What I want to do is randomly present subjects with a sample of unknown wine and have them rate the quality on a scale from 1-7.  To get a bit more stable estimate of quality from our largely untrained wine-tasting subjects, I decided to present three samples of each wine randomly and then average the scores.

We designated one family member as the test administrator, who placed each bottle of wine into a paper bag such that the subjects could not see what wine was being poured in their glass. The test administrator also randomized the presentation order of the trials per repetition, such that order of the wines within each trial block was not the same across blocks.

The experimental procedure was as follows:

  • The test administrator pours approximately 1 oz. of wine into each subject’s glass.
  • The subject tastes the wine and enters a quality score from 1-7 on a response card.
  • The subject then cleanses his or her palate by eating several crackers and smelling a container of ground coffee.
  • This procedure was repeated for each of the three blocks of four trials (12 tastes and ratings per subject in total).

    After the experiment was complete, I was able to enter their responses into a JMP data table. For this experiment, I had a simple research question:

    Are some wines rated significantly higher or lower than other wines?

    To answer this question, I used the Fit Model platform in JMP (Analyze > Fit Model). Because I am analyzing a repeated measures experiment, I used the Mixed Model personality, which is new in JMP Pro 11.

    I have a random effect of Subject that I want to account for in my fit, and the Mixed Model personality lets me do this easily.

    Before fitting the model, it is often a good idea to visualize your data in Graph Builder to see if you can spot any patterns, outliers or other issues that may affect your subsequent statistical analysis.

    In the figure above, you can see the average score for each of my subjects for each of the wine samples. It is not easy to spot any clear patterns between the wine and quality scores. It seems like in general, Wine 2 and Wine 3 seem to be rated higher, but some subjects seem to have vastly different ratings for the same wine. This is something that you will invariably encounter when running these types of behavioral/subjective studies. The mixed model analysis will let us account for these subject-to-subject differences and analyze if there is anything significant in our fixed effect (wine sample).

    Here is the model specification in Fit Model. I’ve selected the Mixed Model personality. The response, our Y, is the Avg. Quality score. The only fixed effect I have in my model is the Wine. And the random effect in my model is Subject.

    Below is the Fit Mixed model output.

    JMP gives me lots of options for customizing the report as well as Fit Statistics, the Random Effects Covariance Parameter Estimates as well as the Fixed Effects Tests. The Effects Tests are what I am really interested in for helping to answer my research question. And it does look like there is a significant effect of wine on quality for this group of subjects. But unfortunately by itself, this doesn’t tell me which wines are significantly different than the others.

    For that, I am going to use another new tool in JMP 11: Multiple Comparisons. You will find the Multiple Comparisons widget in the red triangle menu for the Fit Mixed output. To perform a post hoc test on my fixed effect, I will select the All Pairwise Comparisons – Tukey HSD from the Multiple Comparisons options.

    One of the nice features of Multiple Comparisons in JMP is the informative graphic that goes with the statistics, which make visualizing pairwise differences easy.

    I’ve pinned the labels of the two pairwise differences that are significant. All of the significant differences are colored red and all non-significant pairwise differences are colored blue. This is much easier for me than looking at a table of statistical output (although that table is still there in the report; I’ve just omitted it in this post).

    Wine 2 and Wine 3 are being rated significantly higher than Wine 1. All of the other comparisons failed to reach significance. It may be the case that I did not have enough statistical power to reliably measure a difference – perhaps getting a large sample size next time (inviting the neighbors to our gathering) may be a good idea. But at least for this group of subjects, if we want happy wine drinkers, we should purchase bottles of Wine 2 and Wine 3 again, and avoid Wine 1.

    My methods are far from perfect, and there could be other factors that affect the responses that I have not accounted for in my simple model. But it was a fun way to run a quick experiment and analyze it effectively in JMP using some great new features in JMP 11 and JMP Pro 11.

    Community Member

    Sean S wrote:


    A great graph to start this analysis with would have been a Variability chart with Rating vs Wine and Taster. It would show us the measurement error for each Rater and the overall variation in rating across the wines. One of my many mottos is always start by showing the raw data.

    Daniel Valente wrote:

    Hi Sean --

    Thanks for the comment. Using a variable chart as a way to look at the raw data is an excellent idea.