How to Call B.S. on Big Data: A Practical Guide

At the University of Washington students are learning to navigate the hazards of our informationaddled age.
At the University of Washington, students are learning to navigate the hazards of our information-addled age.PHOTOGRAPH BY BERTHOLD STEINHILBER / LAIF / REDUX

“Nothing that you will learn in the course of your studies will be of the slightest possible use to you,” the Oxford philosophy professor John Alexander Smith told his students, in 1914, “save only this: if you work hard and intelligently, you should be able to detect when a man is talking rot.” Smith might be pleased to know that this week, at the University of Washington, in Seattle, some hundred and fifty students will complete “Calling Bullshit in the Age of Big Data,” a course less profanely and more prosaically known as INFO 198/BIOL 106B. Taught by Jevin West, an information scientist, and Carl Bergstrom, a biologist, it created something of an online sensation when its syllabus went up, in January, and when registration opened it filled to capacity in less than a minute.

The results of our most recent Presidential election notwithstanding, West and Bergstrom maintain that humans are pretty good at detecting verbal bullshit. Members of the species have, after all, been talking rot for millennia, and its warning signs are well known. Bullshit expressed as data, on the other hand, is relatively new outside scientific circles. Multivariate graphs didn’t begin to appear in the popular press until the nineteen-eighties, and only in the past decade, as smartphones and other information-gathering devices have accelerated the accumulation of Big Data, have complex visualizations been routinely presented to the general public. While data can be used to tell remarkably deep and memorable stories, Bergstrom told me, its apparent sophistication and precision can effectively disguise a great deal of bullshit.

Bergstrom believes that calling bullshit on data, big or otherwise, doesn’t require a statistics degree—only common sense and a few habits of mind. “You don’t have to understand all the gears inside a black box in order to evaluate what you’re being told,” he said. For those who were unable to enroll in INFO 198/BIOL 106B this spring, here is some of his and West’s advice:

• Recognize that bullshitters are different from liars, and be alert for both. To paraphrase the philosopher Harry Frankfurt, the liar knows the truth and leads others away from it; the bullshitter either doesn’t know the truth or doesn’t care about it, and is most interested in showing off his or her advantages.

• Upon encountering a piece of information, in any form, ask, “Who is telling me this? How does he or she know it? What is he or she trying to sell me?” (Journalists have their own versions of these questions.) If you’d ask it at a car dealership, West suggested to the students, you should ask it online, too.

• Remember that if a data-based claim seems too good to be true, it probably is. Conclusions that dramatically confirm your personal opinions or experiences should be especially suspect. Bergstrom pointed the class to a study that compared the language used in letters of recommendation for male and female applicants for chemistry jobs. The researchers hypothesized that the letters for men would use more “ability” words (“talented,” “smart”), whereas those for women would use more “grindstone” words (“hardworking,” “conscientious”). Though they found no evidence to back up the idea, readers aware of the very real gender bias in scientific fields inadvertently tweeted the hypothesis, not the results.

• Use Enrico Fermi’s guesstimation techniques to check the plausibility of data-based claims. Fermi, the Italian physicist who created the first controlled, self-sustaining nuclear chain reaction, was also known for his eerily accurate approximations, which he made by replacing each variable in a problem with a reasonable assumption. In July, 1945, as Fermi watched the Trinity Test in the New Mexico desert, he observed the effect of the explosion on small pieces of falling paper—then used that measurement to accurately estimate the strength of the blast to within an order of magnitude.

• Watch out for unfair comparisons. Claims that many more people watched the video stream of the Trump Inauguration than that of the first Obama Inauguration, for instance, failed to acknowledge the vastly greater availability of streaming video in 2017.

• Remember that correlation doesn’t imply causation. A correlation between two variables (ice-cream consumption and shark attacks) may well be due to a third variable (summer weather). These days, spurious correlations often emerge from data mining, the increasingly common practice of trawling large amounts of information for possible relationships. For instance, there is a statistically significant—but, one hopes, meaningless—relationship between the annual divorce rate in Maine and the annual per-capita consumption of margarine in the United States.

• Beware of Big Data hubris. TheGoogle Flu Trends project, which claimed, with much fanfare, to anticipate seasonal flu outbreaks by tracking user searches for flu-related terms, proved to be a less reliable predictor of outbreaks than a simple model of local temperatures. (One problem was that Google’s algorithm was hoodwinked by meaningless correlations—between flu outbreaks and high-school basketball seasons, for example, both of which occur in winter.) Like all data-based claims, if an algorithm’s abilities sound too good to be true, they probably are.

• Know that machines can be racist (or sexist, or otherwise prejudiced). Computer models designed to predict individual criminal behavior have shown bias against minorities, possibly because the data used to “train” their algorithms reflect existing cultural biases. Machines are as fallible as the people who program them—and they can’t be shamed into better behavior.

• Mind the Bullshit Asymmetry Principle, articulated by the Italian software developer Alberto Brandolini in 2013: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. Or, as Jonathan Swift put it in 1710, “Falsehood flies, and truth comes limping after it.”Plus ça change.