Celebrating statisticians: William Sealy Gosset (a.k.a. Student)
Oct 7, 2013 10:41 AM
To many of us, whether statistician or not, the name William Sealy Gosset may be unrecognizable. His pseudonym Student, however, reveals him as one of the most prominent statisticians in history. Student’s t-test is an important part of every introductory statistics course, making everyone from single-statistics-course students to those who have devoted their lives to the discipline familiar with his work.
Gosset was born in Canterbury in 1876, and studied chemistry and mathematics at New College, Oxford. After university, William was hired by Arthur Guinness, Son & Co. as a brewer at the St. James' Gate brewery in Dublin, where he worked from 1899 until 1935. At the time, Guinness became interested in hiring scientists who could apply their skills to the brewing process, and Gosset did not disappoint. In 1904, he wrote an internal report titled The Application of the Law of Error to the work of the Brewery where he made a case for introducing statistical methodologies to the brewing industry (Pearson, 1939). In fact, Gosset’s first paper was an application of the Poisson distribution to yeast counts (Student, 1907).
In the conclusion of Gosset’s report, he suggested consulting a “mathematical physicist” to address some of the more theoretical concerns. In 1906, he took a leave of absence from the brewery to study in the Biometric Lab of Karl Pearson. During this time, Gosset learned about distributional theory and the correlation coefficient. However, the large-sample theory that was made available to Gosset was not entirely practical to his work at the brewery; he seldom had the appropriately “large” sample sizes available to satisfy the assumptions of these methods.
This lack of small-sample methodology led to Gosset’s most famous work in which he summarized the first four moments of the sample varianceand noticed their striking similarity to a Pearson Type III curve (Student, 1908a). His paper contains the derivation of the t-test (though not in its current form), some empirical work, examples and a statistical table for general use. However, the t-test would not see much use outside of his own brewery for many years.
Eventually, a young statistician named Ronald Fisher wrote to Gosset about the denominator of his sample variance; why was it not (n-1)? When Gosset asked Pearson about this, Pearson replied that n or (n-1) means little in large samples but only to “naughty brewers” that “take n so small that the difference is not of the order of probable error” (Pearson, 1939). This initial exchange led to a lifelong friendship between Gosset and Fisher. Fisher thought highly of Gosset’s work and eventually reparameterized Gosset’s derivation into the familiar t distribution with corresponding degrees of freedom we know today. It is also perhaps, through Fisher’s insistence and promotion of the work, that the method found itself in more general use outside of the brewery.
Gosset wrote a companion manuscript to his 1908 paper for the correlation coefficient (Student, 1908b), made contributions to the design and analysis of agricultural experiments, and later published papers in support of the theory of natural selection. In 1935, he moved to London to be head brewer at the new Guinness Park Royal brewery. He died in 1937 at the age of 61, survived by his wife, three children and one grandson. He published 22 manuscripts.
How did William Sealy Gosset become known as Student?
Perhaps out of fear of losing a competitive advantage, the brewery enforced a rule that forbade its scientists from publishing their research. Gosset argued that his work would be of no benefit to other brewers and was finally allowed to publish using a pseudonym – Student – to prevent other employees from noticing. It is interesting to note that two other chemists from the brewery published statistical work under assumed names: Sophister and Mathetes (Hotelling, 1930).
From the manuscripts listed below, it is possible to develop a very accurate picture of William Sealy Gosset. He was well-liked and respected by several notable statisticians including R.A. Fisher, Karl and Egon Pearson (Fisher, 1939; Pearson, 1939). He was a modest man, downplaying the importance of his work to the point where he declared “Fisher would have discovered it all anyway” (Boland, 1984). There is an interesting account of Gosset and Fisher’s relationship (not always free from statistical argument) described through the latter’s second eldest daughter (Box, 1981). McMullen, a former brewery coworker who marveled at Gosset’s many accomplishments, wrote a touching piece that describes Gosset’s personality and many interests in gardening, boat-building , biking, golfing, sailing and fishing (1939). Many of the aforementioned articles contain excerpts from Gosset’s letters to and from Fisher and Karl Pearson and illustrate his good sense of humor (Boland, 1984; Box, 1981; Pearson, 1939).
If you have further interest in Gosset’s life and work, I recommend that you read one or more of the references listed below. In particular, Boland’s (1984) manuscript has a wonderful graphic of overlapping timelines that depict major career highlights of Gosset, Fisher and Karl Pearson.
So this month, we celebrate -- and raise a pint to -- William Sealy Gosset, aka Student: statistician, chemist, gardener and naughty brewer.
Boland PJ. (1984). A biographical glimpse of William Sealy Gosset. The American Statistician 38: 179-183.
Box JF. (1981). Gosset, Fisher, and the t distribution. The American Statistician 35: 61-66
Hotelling H. (1930). British statistics and statisticians today. Journal of the American Statistical Association 25: 186-190.
Fisher RA. (1939). Student. Annals of Eugenics 9: 1–9.
McMullen L. (1939). “Student” as a man. Biometrika 30: 205-210.
Pearson ES (1939). “Student” as a statistician. Biometrika 30: 210-250.
Student. (1907). On the error of counting with a haemacytometer. Biometrika 5: 351-360.
Student. (1908a). The probable error of a mean. Biometrika 6: 1-25.
Student. (1908b). The probable error of a correlation coefficient. Biometrika 6: 302-310.