When two models collide: Analysis of change with two-occasion data
Feb 8, 2017 7:56 AM
Have you ever analyzed the same data using two different but appropriate models that led to completely different inferences? Then you may have encountered Lord’s Paradox (Lord, 1967).
Lord’s Paradox (Lord, 1967) has been puzzling researchers for decades! Especially those who work in areas where designed experiments aren’t possible (think psychologists who study parenting and can’t randomly assign children to good and bad parents).
So how can this happen? I’ll explain with an example.
Imagine that you’re a newly hired data analyst of an up-and-coming start-up company. Your manager is eager to get you onboard to help her discover insights from a data set she recently collected. She wants you to examine how customer satisfaction has changed over the company’s first year.
She surveyed 100 customers when the company first started and surveyed them again a couple of weeks ago (one year later). She particularly cares about customers’ previous experience with a key competitor’s product and whether this has an effect on how customer satisfaction changed in the first year of business. After handing you the data, your manager expresses her desire to meet with you ASAP to learn about your findings. You use JMP to open the file, and it looks like this:
Figure 1. Simulated data of customer satisfaction. Tell us what analysis you'd perform in the comments below.
You have information on whether a customer used (or didn’t use) your competitor’s product prior to using yours. You also have customer satisfaction from when the company first started (at time 1 or T1) and one year later (a couple of weeks ago, or T2), and you have a “Change in Customer Satisfaction” column, which is simply the difference between T2 – T1. Looks like a straightforward project!
You might think of a few ways of analyzing these data, but let me draw your attention to two options:
1) How about doing an analysis of covariance (aka residualized change model)? You can use the Fit Model platform in JMP (Analyze -> Fit Model: Standard Least Squares personality) with Customer Satisfaction at time 2 as your outcome. You’d include the grouping variable, Used Competitor’s Product, as a predictor and Customer Satisfaction at time 1 as the covariate (both of these would be added as “Construct Model Effects”). Your main interest is in the potential effect of the grouping variable on the Customer Satisfaction at time 2, but you make sure to include Customer Satisfaction at time 1 in the model to control for baseline levels of this variable. Here are the estimates from that model (JMP Tip: You can get indicator parameterization by selecting this option in the red triangle menu of the analysis report):
Figure 2. Results from analysis of covariance. Outcome is customer satisfaction at time 2.
Based on these results, you can conclude there’s an effect of having used your competitor’s product prior to using your company’s! It appears current Customer Satisfaction is significantly lower (by about 1.02 units) for those who used your competitor’s product prior to using yours (your manager will be thrilled to learn this)! The R-squared also suggests this isn’t trivial; 80% of the variance in your outcome is explained by the predictors.
2) But wait, how about simply doing at-test with the change scores? You can use the Bivariate platform (Analyze -> Fit Y by X -> Means/Anova/Pooled t) entering the Change Customer Satisfaction column as the outcome and Used Competitor’s Product as your grouping variable (this is equivalent to doing a one-way ANOVA or a simple linear regression with the grouping variable as the predictor, and is also equivalent to a paired-samples t-test on the T1 and T2 scores but that’s a topic for another blog post). After all, your manager emphasized her interest in understanding how customer satisfaction changes. Let’s take a look at the estimates from this model:
Figure 3. Results from t-test on change scores.
Hmm… what went wrong? According to this model (and I promise you I fit it to the same data table), there is no significant difference in how customer satisfaction changed over the previous year between those who used your competitor’s product and those who didn’t. Even more striking is the nearly zero R-square from this model!
This is Lord’s Paradox: the possibility of arriving at completely different inferences based on the same data, yet different models, both of which appear to be appropriate. So let’s unveil what’s behind this puzzle, starting by plotting the data using Graph Builder in JMP:
Unveiling the Puzzle
Figure 4. Scatterplot of customer satisfaction at time 1 and 2 by group (i.e., whether customers used your competitor's product). The line of best fit by group and 95% confidence intervals are also displayed.Notice the customer satisfaction scores from the first survey (x-axis) are systematically different across those who used the competitor’s product and those who didn’t. In fact, customers who used your competitor’s product prior to your company’s had lower customer satisfaction to start with (maybe they struggled adjusting to a new product), and those who didn’t use your competitor’s product had higher customer satisfaction from the get-go. We also don’t observe customers who didn’t use your competitor’s product that reported high satisfaction at time 1.
So why does your first model point to a significant effect when the second model doesn’t? It turns out analysis of covariance makes the implicit assumption that your sample came from one population. If the data truly came from one population, then we’d consider it unusual to have group mean differences at baseline (because we randomly sampled from the population) and we’d expect the second survey means, for those who did and didn’t use your competitor’s product, to get closer to the grand mean (this statistical artifact is known as regression toward the mean). But here, the mean within each group remains nearly identical across time, and for that to happen –when you’ve sampled from one population– there must be a force (i.e., effect) keeping the means from getting closer to each other. This “force” is the effect the analysis of covariance identifies. On the other hand, the t-test doesn’t involve such assumption, and the results reflect this in the null effect. Perhaps plotting the data in a different way can illustrate this further….
Figure 5. Force keeping the mean trajectories from regressing toward the grand mean is superimposed on spaghetti plot.We can see individuals had little change over time and the overall trend is also unchanged; those who started low/high remained low/high in their customer satisfaction.
So given what we know now, what insight will you share with your manager? Will you report an effect of having used your competitor’s product on customer satisfaction? Well, if you’re ever faced with a situation like this in real life, be sure to read more on this issue (I’ve certainly oversimplified it here); a helpful article is van Breukelen (2013).
But for the time being, let me clarify that you wouldn’t be in such a pickle if these data had been collected as part of a designed experiment. Random assignment is a key feature of designed experiments that allows us to ensure variability due to customers’ idiosyncrasies is equally spread across groups. That is, the underlying assumption that you sampled from one population from the get-go would be met and the two models I described here will lead to the same conclusions (although analysis of covariance has more power).
To ease the confusion this topic is known to elicit, I’ll leave you with some take-home points that can help you when you tackle your next data-driven question:
Exploit visualization tools. Get to know your data prior to fitting a model. Often times, we know to look for outliers and data-entry errors, but also remember to look for model assumptions.
I simulated the data for this example, and you can access the file here if you’re inclined to play with it yourself. Undoubtedly, I created a scenario in which differences across models are striking. This facilitates my illustration of the issues, but it doesn’t mean you’re not likely to encounter Lord’s Paradox in your own analyses with real data.
Observational data are tricky. Remember the saying, correlation doesn’t imply causation? Regardless of which model you think is appropriate for your data, neither one tells you whether your competitor’s product is to blame for your customers’ satisfaction levels.
Use the Design of Experiments (DOE) tools in JMP. The issues I outlined here are only encountered in observational studies. Both models above will arrive at the same inference if you conduct a controlled experiment. As long as you can randomly assign your customers into the different groups, then you should always prefer to conduct an experiment and the DOE platform can help you do just that!
Replicate, replicate, replicate. What if we had given these data to two different analysts and one fits an analysis of covariance and the other a t-test? This post illustrates one of many reasons why new analyses might not replicate previous findings. We can’t overlook the importance of replicability. This point is essential regardless of whether your data are observational or experimental. If you can validate your results with an independent sample, do it. And when you do, make sure your analysis approach aligns with that of the original study.
Did I pique your interest enough to continue to read on this topic? If so, look for an upcoming post in which I fit a linear mixed effects model to these data.
Lord, F. M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304-305.
van Breukelen, G. J. P. (2013). ANCOVA versus CHANGE from baseline in nonrandomized studies: The differences. Multivariate Behavioral Research, 48, 895-922.