I saw this "really striking" jobs chart on Twitter recently. Can you see why the data is not as striking as it appears?
It certainly looks striking. The red line (labor force without a bachelor's degree) and the blue line (labor force with a bachelor's degree) were moving in lock step until the 2008 US recession, and then the red line abruptly moves in the opposite direction. It's likely there is some relevant story here, but this chart doesn't tell it very accurately.
The problem is that the red line and the blue line are on very different scales. Their absolute ranges and relative ranges are different. The vertical range is 26 million persons for the blue line and only about 12 million persons for the red line even though the latter has much higher values. Here are both series on the same scale:
That's a more accurate picture, and you can still see the divergence after the recession. But the two groups weren't quite moving on the same trajectory before the recession, after all. The red line's increase was shallower and already starting to level off before the recession, and its dip is not as dramatic, especially in relative terms.
It's rarely a good idea to use two scales on a graph, yet it's pretty common. It seems we have a drive to put as much information as possible on a single graph. It's understandable because the result feels more efficient, and maybe we don't notice the added complexity as we construct the graph one variable at a time. But the viewer sees the whole graph at once and has a bit of mental work to do to decipher two different scales. The whole point of data visualization is to leverage the non-thinking perception we get for free from our visual systems.
Nonetheless, those lines are pretty far apart, making comparisons difficult, so it's not perfect either. Separate graphs is a natural alternative, but we would still have to figure out how to make the disparate scales comparable. So instead, let's look at ways to make dual scales more perceptually accurate.
First, a technical note: The method for making a dual-scaled graph in Graph Builder in JMP is a little below the surface since it's not a recommended technique. First, add two Y variables to the left axis and then right-click on them and choose "Move Right" for one of the variables.
One way to avoid a dual scale altogether is to transform the two variables onto some relative scale, but the downside is that you lose the original units. However, you can get that effect without losing the units by setting each scale as if it were transformed. For example, let's pretend we transform each variable to a percentage of its maximum value. That's equivalent to having the left and right scales both go from 0 to the maximum data value (plus some margin).
That feels truer since they're on the same scale in some sense (0 to 100%), but the bottom half of the graph is empty and feels wasted (though it is serving some purpose). We can extend the logic of Figure 3 and scale both axes to 50% of maximum to 100% of maximum.
Now we can see each trend in context of its own range. However, the rates of change are not comparable to each other: A vertical distance of 5 million on the left axis is about the same as 10 million on the right axis. If we wanted to compare rates of change or vertical change in general, we'd need to give both scales the same range.
Now we can see that the red loss is actually less than the blue gain after the recession, though the loss looked much bigger in the original (Figure 1). Notice that in this mode, with equal ranges, we can still move each entire scale vertically and make the crossover point arbitrary, which is a scary idea. The casual viewer will think the crossover point is significant, but it's not.
With only a few scale modifications, we can tell a completely different story: The lines started out together, then the red line surged before finally coming back down to blue line levels again. The scales here are just as justified as those in Figure 1.
For more discussion of the general issue, see Dual-Scaled Axes in Graphs by Stephen Few.
I've referred to the lines by their colors partly because the variables names are wordy but also to stress that the concepts are generic. However, the appropriateness of each scaling technique can depend on the meaning of the data, such as whether relative or absolute change is more relevant.
Now I'd like to hear from you: Given that no view is perfect, which of the six figures do you think represents this jobs data best?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.