The Seven Pillars of Statistical Wisdom

A Brief Book Review of “The Seven Pillars of Statistical Wisdom” by Stephen M. Stigler

All of learning is managing forgetting. That’s something I’ve been saying a lot the last year, especially after I read the remarkable book Make It Stick: The Science of Successful Learning (2014) by Peter C. Brown, Henry L. Roediger III, and Mark A. McDaniel. The meaning of this concept could be considered in two ways. The first is that of employing strategies to help enable the resurfacing of knowledge and wisdom as their stickiness fades with our memory. To explain this, I have borrowed an idea I think I took from the authors: Imagine any idea or unit of knowledge or learning as an object adrift among many in a great sea, with that sea being your general intellect. Whether we like it or not, we each then face the task of trying to find a way to keep that object of knowledge afloat and at the surface, even though time, aging, distraction, etc., conspire to push it below the surface. Brown et al. make the claim that “desirable difficulties” such as low-stakes quizzes, flash cards, or purposeful reflection through writing are great ways to do this. Reading their work helped me double down on deciding to occasionally write reviews after reading books.

The second meaning for the notion that learning is about managing forgetting is far more radical, even today, a few hundred years after its formal finding. And that’s through the statistical concept of aggregation. Aggregation, as explained by the University of Chicago’s Stephen M. Stigler in his fascinating book The Seven Pillars of Statistical Wisdom (2016), is the idea that “discarding information can increase information”. What he means by this is that statistical summaries, such as a mean, median, or mode, are powerful because they purposefully instruct us to take all of our many observations of some event as a whole and to forget the unique information found in each observation. A statistical summary provides us new information not common in the individual observations. To make useful forecasts of the future you will have to try to understand what is typical and to make some thoughtful generalizations. To do this requires discarding some of the unique attributes of your distinct observations.

Investors Benjamin Graham and David Dodd both knew and understood this. They write in the second edition of Security Analysis (1940) about the usefulness of not judging a company’s earning power by its most recent period of reported results. They often employed statistical summaries stretching across several years, or comparing three-year periods set years apart, as a means of finding out what performance is typical for a company—something unlikely to be gleaned or evident from any individual quarter or year of its life. Yale Professor and Nobel Laureate Robert J. Shiller reprised this idea decades later in his cyclically adjusted P/E ratio (or CAPE). Clearly, all understood the perils of the approach popular today of focusing on assessing earning power based on next twelve months (i.e., NTM), last twelve months (i.e., LTM), or run-rate earnings.

While to some the many aspects of this principle of aggregation might seem obvious, it is clear from headlines and modern capitalist advertising that it is not to many. On one end of the spectrum you have people upset about the data collected on each of us by companies such as Alphabet, Facebook, and Amazon and how their data capture and use might erode our privacy. And this is a valid concern made real by the clumsy statements and aggressive strategies of these companies. But it is also true that these companies have become highly valuable because the data they have on each of us individually to them is worthless. It is only our data in aggregate that is valuable to them and that allows them to serve advertisement space to mass advertisers. No individual advertiser wants to specifically reach you, dear reader; they want to reach a big group of people like you who share your interests or politics. This is all to say that, absent being incredibly wealthy or famous, the value of one person’s data to one of these behemoths is essentially worthless. When you individually choose to leave the social hive of Facebook, its profit engine continues unfettered. When we all leave, it ceases to exist. It is only our data aggregated that is incredibly valuable.

On the other end of the spectrum, we have companies actively suggesting that your individual data is actually incredibly valuable to them. These companies, such as American Express or boutique hotel chains like Kimpton or Canopy, claim that you are neither just a number with them nor a statistic and that you will get the individual attention you and your hard-earned dollars deserve. They are reacting very profitably to the notion that many people do not want to be grouped or feel grouped for statistical summaries. Perhaps this is something these consumers choose to forget when they complain to the Alphabets and Facebooks of the world about the treatment of their data. But I digress. This concept is explored well in Ben Thompson and James Allworth’s wonderful Exponent podcast as well as Yuval Noah Harari’s great Homo Deus: A Brief History of Tomorrow (2015), which I reviewed last year.

Stigler has written a strong book and is an expert on the history of statistics. Starting with aggregation, he deftly explains its place as a pillar supporting statistics as a data science since antiquity, bringing in Thucydides in a charming section. Stigler goes on to explain the importance of the other six pillars, Information Measurement, Likelihood, Intercomparison, Regression, Experimental Design, and the idea of the Residual.

The book is written in an accessible style, but it probably should not be a first-read in statistics; much is presumed on the part of the reader and much is only suggested. For example, in explaining the importance of Sir Frances Galton and his work Hereditary Genius (1869) to the history of statistics, Stigler writes, “Not all of that book enjoys high reputation today, but the statistical method was sound.” Hmm. As Stigler was born near Minneapolis and taught at the University of Wisconsin-Madison, this might be a very Minnesota-nice way of saying something else; namely that while Galton was important in the development of modern statistics, unfortunately, he was also the father of eugenics and promoted incredibly racist ideas among white intellectuals that we are still fighting today. This isn’t the book for learning that, at least explicitly. (If I am wrong on this point then I will purify myself in the waters of Lake Minnetonka.)

In a longer book, perhaps Stigler would have done away with the passive, drawing-room allusions and discussed more of these historical controversies head on. Or maybe that’s not his style. But it felt like something was missing here in ignoring how these important statistical principles, and even much of scientific discovery, are not inherently good. Their goodness seems to depend on the context of their usage and the beliefs of their users.

Moments like that aside, I am glad I read this book as it seemed to provide a useful framework for thinking about statistics and the ever-changing importance (and unimportance) of various ideas in the history of science. It also had a good bibliography I will likely mine for more books, including others written by Stigler.