Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
In my previous post, I examined how the dual scales on this jobs chart are sending an inaccurate message. But my efforts to remake alternate graphs from the raw data revealed another source of message corruption.
Reading the graph text closely, you see that the red line is really an aggregate of three measures obtained from the Federal Reserve Economic Data (FRED) site. That is, the red Less Than Bachelor's line is the sum of the education levels Less Than High School Only, High School and Some College. The chart suggests the Less Than Bachelor's group was ascending with the Bachelor's or Higher group and took a sharp turn for the worse after the 2008 recession.
Here's a graph of all four individual measures (on the same scale!).
Figure 1: Jobs by Education Level
It turns out that each of the three components has its own story. Less Than High School (orange) was flat until taking a hit before the recession and slowly declining afterwards. High School (red) was flat then became declining after the recession. Only Some College (purple) had any upward trend before the recession, and that trend slowed but continued upward after the recession.
Aggregating the three Less Than Bachelor's groups creates a single up-and-down group, though none of the subgroups showed that pattern. Those three groups do have one thing in common, which might justify the aggregation: They all keep losing ground against their pre-recession trend. To make that easier to see, I've added lines of fit (dashed) that are just based on the pre-recession data.
Figure 2: Jobs compared to pre-recession trends
My macroeconomics knowledge and my data are exhausted at this point, so I can't really say how fair or significant the aggregated trend really is. Possibly there are demographic factors involved, such as the increasing number of people attending college. Possibly aggregating based on some college or no college is more relevant.
Figure 3: Jobs by aggregated education level
In any case, it's important to remember that a graph communicates a message, but that message is not necessarily accurate or relevant. In Part 1 of this deconstruction, we saw that the dual scales affected our perception of the trends. Here, we see that the aggregation of the data also affects the message.
Making the graph
How was Figure 2 created in Graph Builder? It's four variables and two elements (connected line and regression line). To make the regression lines only apply to the before-recession data, I added a new variable that was set to 1 before the last recession and 0 otherwise, and I put it in the Frequency role. Then I uses the element properties to turn off the Frequency variable for the connected line element, so that it only affected the regression line. For the in-graph labels, I used the annotation tool. You can right-click on an annotation to change the color and other properties. One of Edward Tufte's principles is to include labels close to the data when you can. It's not something that's easy for software to do automatically, but it's not too hard to add manually.