The name “dot plot” can refer to a variety of completely different graph styles. Well, they have one thing in common: They all contain dots. For analytic use, the two most prominent styles are what we might call the Wilkinson dot plot and the Cleveland dot plot.
The Wilkinson dot plot displays a distribution of continuous data points, like a histogram, but shows individual data points instead of bins.
Though variations of such plots have been around for more than 100 years, Leland Wilkinson’s seminal paper “Dot Plots” largely standardized the form. Last summer, the support for Wilkinson dot plots in JMP was greatly enhanced by an add-in, which is now built-in to JMP 11.1 (see the Wilkinson dot plot blog post).
This kind of dot plot is similar to a bar chart, but instead of using length to encode the data values, it uses position. As a result, the dot plot does not need to start its data axis at zero, can use a log axis and is more flexible for overlaying multiple variables. Cleveland breaks down the estimation aspect of graph perception into three parts: discrimination, ranking and ratioing. In general, dot plots help with the first two at the expense of the third, making relative proportions less accessible. For instance, it’s easier to see when one bar is twice as long as another without consulting the axis.
Cleveland’s books, along with Wilkinson’s The Grammar of Graphics, were influential in the creation of Graph Builder, and as a result, the Points element is the default view in Graph Builder for both continuous and categorical data.
Below is a Graph Builder recreation of a Cleveland’s display of barley yields . A challenge: Can you spot the odd feature of the data?
The use of dotted lines is presumably a constraint of black and white printing, and it’s more common to see faint gray lines in dot plots. Beyond the usual drag-and-drop of variables into roles, the Graph Builder steps to make the dot plot above are:
Add a Value Ordering property for the Variety column (on the Y axis) to match Cleveland's order.
Put the Site variable in the Group Wrap role and set the number of columns to be 1.
Turn off Show Title for Site.
Turn on grid lines for the Y axis.
Change the legend position to the bottom.
And now the answer to the challenge: The odd feature of the data is that the 1931 values are generally greater than the 1932 values except for the Morris site, which suggests the values may have been swapped.