Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Staff
Dual-scaled jobs chart deconstructed, part 1

I saw this "really striking" jobs chart on Twitter recently. Can you see why the data is not as striking as it appears?

Figure 1: Original tweet

It certainly looks striking. The red line (labor force without a bachelor's degree) and the blue line (labor force with a bachelor's degree) were moving in lock step until the 2008 US recession, and then the red line abruptly moves in the opposite direction. It's likely there is some relevant story here, but this chart doesn't tell it very accurately.

The problem is that the red line and the blue line are on very different scales. Their absolute ranges and relative ranges are different. The vertical range is 26 million persons for the blue line and only about 12 million persons for the red line even though the latter has much higher values. Here are both series on the same scale:

Figure 2: Single Scale

That's a more accurate picture, and you can still see the divergence after the recession. But the two groups weren't quite moving on the same trajectory before the recession, after all. The red line's increase was shallower and already starting to level off before the recession, and its dip is not as dramatic, especially in relative terms.

Alternative visualizations

It's rarely a good idea to use two scales on a graph, yet it's pretty common. It seems we have a drive to put as much information as possible on a single graph. It's understandable because the result feels more efficient, and maybe we don't notice the added complexity as we construct the graph one variable at a time. But the viewer sees the whole graph at once and has a bit of mental work to do to decipher two different scales. The whole point of data visualization is to leverage the non-thinking perception we get for free from our visual systems.

Nonetheless, those lines are pretty far apart, making comparisons difficult, so it's not perfect either. Separate graphs is a natural alternative, but we would still have to figure out how to make the disparate scales comparable. So instead, let's look at ways to make dual scales more perceptually accurate.

First, a technical note: The method for making a dual-scaled graph in Graph Builder in JMP is a little below the surface since it's not a recommended technique. First, add two Y variables to the left axis and then right-click on them and choose "Move Right" for one of the variables.

One way to avoid a dual scale altogether is to transform the two variables onto some relative scale, but the downside is that you lose the original units. However, you can get that effect without losing the units by setting each scale as if it were transformed. For example, let's pretend we transform each variable to a percentage of its maximum value. That's equivalent to having the left and right scales both go from 0 to the maximum data value (plus some margin).

Figure 3: Dual scales at 0 to 100%

That feels truer since they're on the same scale in some sense (0 to 100%), but the bottom half of the graph is empty and feels wasted (though it is serving some purpose). We can extend the logic of Figure 3 and scale both axes to 50% of maximum to 100% of maximum.

Figure 4: Dual scales at 50% - 100%

Now we can see each trend in context of its own range. However, the rates of change are not comparable to each other: A vertical distance of 5 million on the left axis is about the same as 10 million on the right axis. If we wanted to compare rates of change or vertical change in general, we'd need to give both scales the same range.

Figure 5: Dual scale with equal ranges

Now we can see that the red loss is actually less than the blue gain after the recession, though the loss looked much bigger in the original (Figure 1). Notice that in this mode, with equal ranges, we can still move each entire scale vertically and make the crossover point arbitrary, which is a scary idea. The casual viewer will think the crossover point is significant, but it's not.

A completely different story Here's one final graph to show how really dangerous dual-scaled graphs can be.

Figure 6: Dual scales for a custom story

With only a few scale modifications, we can tell a completely different story: The lines started out together, then the red line surged before finally coming back down to blue line levels again. The scales here are just as justified as those in  Figure 1.

For more discussion of the general issue, see Dual-Scaled Axes in Graphs by Stephen Few.

I've referred to the lines by their colors partly because the variables names are wordy but also to stress that the concepts are generic. However, the appropriateness of each scaling technique can depend on the meaning of the data, such as whether relative or absolute change is more relevant.

Now I'd like to hear from you: Given that no view is perfect, which of the six figures do you think represents this jobs data best?

Article Labels

There are no labels assigned to this post.

Article Tags
Visitor

Emil Friedman wrote:

Could you post the data in a JMP file? It would also be interesting to look at the total of the two groups.

Staff

Xan Gregg wrote:

Good idea to post the data, Emil. Why didn't I think of that?

I've posted the JMP file, which includes some graph scripts, to the JMP File Exchange as Civilian Labor Force 1991 - 2014.

Visitor

geode wrote:

I don't get this. You rightly point out the perceptual problems of using a double scale approach, and then merrily proceed to create a whole bunch of new ways to confuse everyone. There's nothing essentially different between the original 'really striking' version and your adapted Figures 4-6. IMHO the only double-scale example here that should ever reach an audience is your Figure 3 dual scales 0-100%.

Visitor

EngrStudent wrote:

Reproducible programming is key to good science. You should include raw data, and relevant scripts to re-create the graphs.

Staff

Xan Gregg wrote:

Great point, EngrStudent. I used to complain when people published graphs without data, and now I'm doing the same thing! Fortunately, commenter Emil already pointed that out, and I've since put my JMP data table and its embedded graph scripts onto the JMP File Exchange. See link in my earlier comment. The really raw data is at http://research.stlouisfed.org/fred2/.

Visitor

Dual Y axis chart problem wrote:

[â ¦] scales is one thing that makes dual-axis charts hard to use and to read. For a discussion, read Dual-scaled jobs chart deconstructed. I can adjust the scale on the Sales axis and get parallel curves for price and revenue, and in [â ¦]

Staff

Xan Gregg wrote:

Hard to disagree with you, geode. Sorry for the confusion along the way. I am hoping that multiple semi-rationalized alternatives will highlight the fact that each tells a biased story.