Subscribe Bookmark



Jun 23, 2011

John Tukey on the rule of zero-origin scales

I saw the following post recently on Twitter:

Eric Jonas @stochastician Mar 16

There’s basically never a reason to start the y-axis of your comparison graph anywhere besides zero.

It generated several dissenting replies, including one from me. Coincidentally, I had just reread part of John Tukey's classic book Exploratory Data Analysis (1977), in which he shows a good counter-example to that guideline. His example comes from a discussion introducing a variation of a box plot called a "schematic plot." He introduced the general box plot in 1969, and the schematic plot refined the box plot with a specific set of rules for the whisker lengths and outlier displays, which has always been the default in JMP where it's called an "outlier box plot."

The example in question uses Lord Rayleigh's measurements of the mass of nitrogen. It's a very small data set by today's standards, and Tukey nicely lists the data in his book. I also checked Lord Rayleigh's publication, On an anomaly encountered in determinations on the density of nitrogen gas, 1894 which contains a few more observations and a couple minor differences from Tukey's data. I've attached CSV and JMP versions of the data set in my JMP User Community post, with Tukey's data in a separate column. Here's an excerpt from Lord Rayleigh's paper:


Importantly, the measurements record the details about how the nitrogen itself was produced. The graph below shows the recorded weight (in grams per "globe") versus its source and the purifying agent.


The main difference is whether the nitrogen comes from air or not, which is how Tukey shows it. Here are some of his text and figures.




Although Tukey is comparing summary views (box plot vs. mean bar chart), his point holds for raw data as well. Here are JMP scatterplot versions of those plots.



It turns out that Lord Rayleigh's "nitrogen" from air also contained other elements unknown at the time, and the small differences led to the discovery of the element argon, for which he won a Nobel Prize.

So while a zero scale is often wise for comparison graphs, there is no substitute for making an intelligent choice. As Tukey suggests, the zero-origin plot doesn't make the case for a Nobel Prize.

Community Member

Emil Friedman wrote:

When plotting the assay of a pharmaceutical versus time on stability or to show variation between batches is would be silly to include zero. It makes much more sense for the scale to be just wide enough to clearly show the acceptance limits and all of the data.

Note that the views expressed are mine alone and do not necessarily reflect the view of my employer.

Community Member

Paresh Shah wrote:

While I agree with your contention that one should make an "intelligent choice" regarding the zero scale, it is unfortunate that generally "unintelligent choices" are made particularly in business graphics. I have seem some horrendous examples an example of which I have covered in one of my post . Incidentally I was corresponding with the site referred to in the post, but they merrily continue chopping the axis !!!

In this context, suggesting that other alternatives are used [dot plot] seems to be appropriate rather than suggesting that we can intelligently decide when to chop.


Xan Gregg wrote:

Thanks, Paresh. I'm taking as a given that scale "chopping" is never OK for bar charts since the length of the bar strongly encodes the value. However, it's obviously not universally known given the "horrendous" examples you found.

Community Member

David Burnham wrote:

It is easy to distort a perspective my changing the scale of an axis. But if you consider what we do when we perform a hypothesis test in relation to a oneway analysis of variance then it is conducted relative to a grand average, or with regression the intercept acts as the average, so in both instances the scale is not relative to zero.

Community Member

Rick Wicklin wrote:

Zero is just a reference value that is often convenient For plots of ratios, we might want to include 1 as a reference line. For plots of proportions of some disease across subpopulations, we might want to include a reference line for the population at large. For a pharmaceutical study, the reference line might be the response value in a control group.

Article Tags