Subscribe Bookmark RSS Feed
curt_hinrichs

Joined:

Jun 26, 2014

Celebrating Statisticians: John W. Tukey

“Statistics is a science in my opinion, and it is no more a branch of mathematics than are physics, chemistry and economics;  for if its methods fail the test of experience – not the test of logic  they are discarded.”  - John Wilder Tukey

“Box plot,” “stem and leaf plot,” ANOVA” and, yes, even “bit,” “software” and “vacuum cleaner” are terms coined by this month’s featured statisticianJohn Wilder Tukey – whom we are honoring as part of celebrating the International Year of Statistics. Tukey’s impact on science and society is so wide and significant that I will only highlight a few areas here. As a side note, Tukey’s influence on JMP is also significant, and I’ll save an interesting connection for the end of this blog post.

Many recognize Tukey as the father of exploratory data analysis, in part for creating many effective visual techniques such as the box plot and stem and leaf plot, which are standards in introductory statistics courses today. He made many enduring contributions in time series, multiple comparisons, ANOVA, robust statistics, and interactive and multivariate graphics, too. Tukey considered himself a “scientific generalist” and influenced many other scientific fields, including computer science, mathematics, engineering and economics. He even served as an adviser on environmental, defense and education policy to the highest levels of US government. He collaborated with scientists such as Von Neumann, Feynman, Cooley and Morganstern, and served as doctoral adviser to many great statisticians.

Tukey was born on June 16, 1915, in New Bedford, Massachusetts, and was mostly home-schooled as a child. At the age of 18, he entered Brown University and earned his BS and MS in chemistry, thereafter pursuing a PhD at Princeton. Soon after arriving at Princeton, his graduate studies evolved toward mathematics. Tukey’s studies focused mostly on pure mathematics, and his thesis, “On Denumerability in Topology,” was published as a book in 1940; mathematician Paul Halmos includes that book among the influential books of the period 1888-1988. After earning his PhD, Tukey became Math Instructor at Princeton and two years later Assistant Professor of Mathematics. By then, the US was engaged in World War II, and like many other leading academic institutions at that time, Princeton was involved in research to support the war effort.

The Fire Control Research Office under the direction of Merrill Flood was located in Princeton, and Tukey joined the office as Consultant in 1941. The office worked on practical problems of warfare, and these led to the use of statistics and association with statisticians such as Charlie Winsor, whom Tukey credits with converting his interests toward statistics.

“He (Charlie Winsor) was data-oriented. I well remember walking up past by old Fine hall and hearing Charlie say, ‘Well, Sam Wilks trains good mathematical statisticians, and it’s surprising how soon they become good statisticians.’ But, associating with Charlie and living in the data-rich environment where what we were doing was trying to make sense out of data left me with an ultimate data-orientation.” - Tukey

Tukey returned to teaching after the war, and began a transition to statistics research and service in the years immediately following. Within the math department, Tukey joined the Section of Mathematical Statistics, which was headed by mathematical statistician Samuel Wilks. The Applied Statistics seminar began in 1946, introducing many state-of-the-art concepts to the graduate community at Princeton. He became Professor of Mathematics in 1950, Chair of the newly formed Statistics Department in 1966 and was Professor of Statistics and Donner Chair of Science until his retirement from Princeton in 1985.

“..the first time I was in a Statistics course, I was there to teach it.”  - Tukey

In 1945, Tukey also began his 40-year association with Bell Laboratories. The parent company’s focus and expertise in communications and technology provided valuable research, in part, to support of national security interests. Beginning in 1960, Tukey served as a member of the President’s Science Advisory Committee and over the course of his career advised five US presidents. This work involved advising on environmental and education policy, and in the early 1960s, policy on nuclear weapons testing. It was in this capacity that one of his most important discoveries was made.

While serving on President Kennedy’s Science Advisory committee with the physicist Richard Garwin, the problem of off-shore monitoring and verification of a possible nuclear test ban treaty with Russia was discussed. Garwin and Tukey devised a potential solution that involved a method of monitoring seismic activity with remote sensors and analyzing their signals to distinguish an earthquake from a nuclear test. From this discussion and further work by James Cooley of IBM, the Cooley -Tukey Fast Fourier Transform (FFT) algorithm was developed. The Cooley-Tukey algorithm was a substantial simplification and vast improvement on calculations of Fourier series and integrals.

“If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it.” - from Numerical Recipes

A Few Important Contributions

  • Exploratory Data Analysis: Tukey developed useful techniques for visualizing and summarizing data, the 5-number summary, box plots, and stem and leaf diagram. His book Exploratory Data Analysis published in 1977 introduced many of these techniques at a time when elementary statistical education was mainly focused on Inference and hypothesis testing. “Since the aim of exploratory data analysis is to learn what seems to be, it should be no surprise that pictures play a vital role in doing it well. There is nothing better for making you think of questions you had forgotten to ask (even mentally),” Tukey said.
    • Interactive Multivariate Graphics: In 1972, Tukey spent four months at Stanford’s Linear Accelerator Center. He worked with Jerome Freidman on the development of PRIM-9, which was the first computer program that offered interactive and dynamic multivariate graphics. Features such as rotation and masking (brushing) were developed, along with the concept of Projection Pursuit in this work.
      • Multiple Comparisons: Some of Tukey’s earliest work in statistics focused on Multiple Comparisons. The Tukey-Kramer test or Tukey HSD (Honestly Significant Difference) test is a standard used in ANOVA for pairwise comparisons more than 60 years after it was introduced.
        • Robust statistics: According to Peter J. Huber, Tukey was “the first to recognize the extreme sensitivity of some conventional statistical procedures to seemingly minor deviations from the assumptions…. In Tukey’s view, robustness is an attribute of the statistical procedure, typically to be achieved by weighting or trimming the observations. This ought to be contrasted to say, George Box’s view, who thought the data should not be tampered with and that the model itself should be robust.”
        • “Once upon a time statisticians only explored, then they learned to confirm exactly – to confirm a few things exactly, each under very specific circumstances. As they emphasized exact confirmation, their techniques inevitably became less flexible. The connection of the most used techniques with past insights was weakened.  Anything to which a confirmatory procedure was not explicitly attached was decried as 'mere descriptive statistics' no matter how much we had learned from it.”  - Tukey

          Tukey earned many awards, including the National Medal of Science, the Samuel S. Wilks award, the Shewhart Medal, the IEEE Medal of Honor and the Deming Medal. He died on July 26, 2000, in New Brunswick, New Jersey. He made many other contributions that are covered in greater detail elsewhere, and I reference a few sources below. In particular, David Brillinger’s piece provides many firsthand insights into the life and work of Tukey.  

          The Anscombe Connection

          John Tukey’s brother-in-law was Francis J. Anscombe – their wives were sisters. Anscombe was a British statistician who joined Tukey at Princeton for a while and shared many similar views when it came to data analysis and the use of graphics. Anscombe left Princeton for Yale and was a pioneer in the emerging field of statistical computing. In 1973, the American Statistician published his article, "Graphics in Statistical Analysis." The “quartet” was introduced in this paper to motivate the concept that statistical graphics often reveal understanding that is not apparent from statistics alone and should accompany the analysis of data. The “quartet” (at right) clearly illustrates four very different graphs of data. But if you only looked at the statistical results that are identical, you would conclude that each bivariate analysis were the same.

          As Anscombe said, “A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”

          These recommendations influenced a young developer and motivated the development of a new and innovative data analysis software that was launched in the late 1980s. The software was JMP, and the developer John Sall.

          You can access the data table and script for Anscombe’s quartet in JMP’s sample data directory (Help > Sample Data > See an Alphabetical List of all Sample Data Tables > Anscombe).

           

          Recommended Reading

          Anscombe, F.J., (1973) "Graphs in Statistical Analysis," American Statistician 27, 17-21.

          Brillinger, D. R., (2002) "John W. Tukey: His Life and Professional Contributions," Annals of Statistics, 30, 1535-1575.

          Cleveland, W.S., (1984 – 1994) The Collected Works of John W. Tukey, Volumes I-VIII, Wadsworth & Brooks-Cole.

          Cooley, J. W, (1992) "How the FFT Gained Acceptance," IEEE SP magazine, January, 1992, 10-13.

          Friedman, J.H. and Stuetzle, W. (2002) "John W. Tukey’s Work on Interactive Graphs," Annals of Statistics 30 1629-1639.

          Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., (1986) Numerical Recipes: The Art of Scientific Computing, Cambridge University Press.

          Salmon, J. and Valente, D. (2013) "JMP® Extensibility Synergy with MATLAB: Case Studies Using the JMP® Interface to MATLAB," SAS Institute White Paper (in press).

          Tukey, J.W., (1977) Exploratory Data Analysis, Addison-Wesley, Boston.

          Wikipedia, John Tukey,  http://en.wikipedia.org/wiki/John_Tukey (accessed July 28, 2013).


          7 Comments
          Community Member

          Mike Clayton wrote:

          When I asked one of my stat mentor's WHY Tukey used the magic 1.5 multiplier in his box and whisker plot to denote expected range of raw data relative to the IQR (box) and thus highlight outliers, he said that he once asked Tukey that same question, and the answer was "it seems to work."

          If anyone has also heard that story, or has different viewpoint, please comment.

          We engineers were often given stories like that by stat gurus to help us remember concepts.

          Community Member

          Bradley Jones wrote:

          If the data were normally distributed with a mean of zero and a standard deviation of one, then the IQR would be about 1.35. The top of the wisker would be at 0.67+1.5*1.35 = 2,695. The probability of being between -2.695 and +2.695 is a little larger than 99%. So, for normal data the range box plus the whiskers should contain roughly 99% of the observations.

          Of course, Tukey did not want to make any distributional assumption. For different distributions the exact percentage of observations you would expect to fall inside the whiskers would vary a bit. That did not worry Tukey. The "magic" 1.5 is a convenient rule of thumb that does, in fact, seem to work. Sometimes loose language is useful allowing one to avoid unimportant detail.

          Community Member

          Richard Lenski wrote:

          And let's not forget the jackknife procedure!

          Community Member

          Pascal R. wrote:

          Very interesting. Thanks for sharing.

          Community Member

          Rick Wicklin wrote:

          I'd like to emphasize Tukey's contributions through his students. He produced 55 PhDs, including his EDA desciples Hartigan, Hoaglin, and Mosteller. In all, he has almost 1,200 "descendants," who were influenced by his views on how to analyze data. For a complete list, see http://genealogy.math.ndsu.nodak.edu/id.php?id=15860

          Community Member

          Data Viz News [23] | Visual Loop wrote:

          [...] Celebrating Statisticians: John W. Tukey | JMP [...]

          Community Member

          Data Viz News [23] â º Soci-all wrote:

          [...] Celebrating Statisticians: John W. Tukey | JMP [...]