Graph Builder in JMP can generate dozens of types of charts, but it always starts with some form of dot plot. The idea is to show the data in its most basic representation, one dot per row. Beyond the familiar scatterplot form with two continuous variables, dot plots adapt to many situations: with any combination of continuous and categorical variables, with univariate data, with multivariate data and with small or medium sized data sets.
However, in one situation the dot plot invariably gets replaced with a bar chart, even at the expense of obscuring or distorting the data: when there is one continuous variable, one categorical variable with one data value per combination. After seeing yet another truncated bar chart, I wondered why on Twitter and found similar thinking from data visualization educator and consultant Nick Desbarats of Practical Reporting. We decided to team up on this article, and in this next section, Nick frames the issue from his teaching experiences.
One of the chart types that I [Nick] cover when teaching Stephen Few’s Show Me the Numbers course is the Cleveland dot plot (named after pioneering data visualization researcher William S. Cleveland):
If you’re like most of my workshop participants, though, you won’t recognize this chart type. When I ask participants if they’ve seen one before, somewhere between 0% and 10% of hands typically go up. Immediately following the few hands that do go up, a much larger number of eyebrows invariably go down. The dominant emotion on the faces of participants is unmistakable: it’s confusion.
If you’re confused too, let me explain: The chart above is essentially a horizontal bar chart with seven bars, but with the end of each bar indicated by a dot and the bars removed. Why would I use dots instead of bars? Because, if I used bars, my chart would look like this:
A bar chart such as this would be misleading because it looks like, for example, Brazil’s record is about twice that of Indonesia since its bar is about twice as long. Looking at the value scale below the bars tells us that that’s not even close to true, however. This perceptual problem is the reason why bar charts with quantitative scales that don’t include zero are a best practices no-no. One way to solve it would be to start the scale at zero:
This is perceptually safer but, with this particular data, it’s difficult to see the differences between the values since they’re so close to one another. If we’re comparing long jump records, though, small differences are very important, so this graph wouldn’t be a very effective way to visualize this data. How could we visualize these values in a way that allows the differences between them to be clearly seen but also not risk causing people to visually misperceive them? The Cleveland dot plot:
Because there are no bar lengths to compare in a dot plot, the risk that viewers will mistakenly perceive that, for example, Brazil’s record is twice that of Indonesia is much lower, but they can also see the small differences between the values much more clearly and easily. Like I said, useful.
Yes, the confusion that my workshop participants experience is short-lived. Following a 30-second explanation, the eyebrows come back up and a quiet “ooooh…” rises from the room. And yes, even without my explanation, most participants would eventually figure it out, although I wonder how many would bother to make that journey unaided. Participants regularly (and sheepishly) tell me that, even though they now understand how dot plots work and how useful they are, they won’t use them because their audiences won’t bother to figure them out.
To those of us who are familiar with dot plots, this seems strange. Why would so many people have trouble grasping dot plots when those very same people have no problem grasping bar charts and line charts, as well as other chart types that are arguably less intuitive, such as scatterplots? After observing thousands of people reacting to seeing a dot plot for the first time, a combination of factors now seems plausible:
Note that none of these problems arise from the fact that the graph above is a dot plot but, instead, from the individual design elements (dot shape, presence/orientation of gridlines, etc.) of this particular dot plot design, which is the traditional design for dot plots. By changing the design elements, then, might we be able to make dot plots “friendlier” to those who are encountering them for the first time? Even, possibly, friendly enough that they require no explanation at all? My recent tweets on dot plot alternatives dovetailed with Nick's thinking, and we decided to collaborate on this article in an effort to design a friendlier dot plot. The main goals of our redesign, then, would be:
How might we modify the design elements of the Cleveland dot plot to achieve these redesign goals and make dot plots friendlier, then? Thinking broadly at first, we can imagine the following changes:
The possible combinations of design elements add up quickly (or, should we say, “multiply up” quickly), and the above options don’t yet include non-data variables like chart size, orientation, and borders, which also influence comprehension. The gallery below features just a few of the combinations of our broad array of options. The data comes from Wikipedia’s list of national long jump records. To control the number of categories and still provide a reasonable spread of values, we display only the countries with the largest populations. As a result, small countries with excellent long jump records do not appear.
When the number of categories is larger, it becomes more difficult to visually associate marks with their corresponding category labels, so the design choices that work best for charts with a small number of categories may differ from those for charts with a larger number of categories. The gallery below features a few design element combinations with a larger number of categories:
As if we didn’t already have enough design choice combinations to consider, there’s also the orientation (horizontal or vertical) to throw into the mix:
Since we can’t show all possible design element combinations and we want to encourage experimentation, we made a cool interactive Observable notebook that allows anyone to experiment with different combinations of design elements. On that page, you can use a control panel (see screenshot below) to instantly see what different design combinations look like in a sample chart that’s regenerated every time a design element is changed.
After experimenting with various combinations ourselves, and keeping our redesign goals in mind, we hypothesized that the ones below would perform best with dot plot-naïve users or, at least, perform better than the conventional design.
For data sets with fewer than about eight categories:
We favored these designs for several reasons:
If the data to be visualized has more than about eight categories, associating the marks with their category labels becomes more effortful and error-prone, and so some form of category guide becomes necessary. To our eyes, thick but light category guides worked best with pipe marks, and thin separator lines worked best with faded bar marks:
We haven’t tested these designs on dot-plot-naÏve users against the conventional design, however, so these are only hypotheses at this point. For the reasons discussed above, though, we suspect that they’d perform better than the conventional dot plot design. Let us know what you think in the comments, or if you think that other combinations of design elements might make for a friendlier dot plot.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.