The New Thing in Google Flu Trends Is Traditional Data

Photo
A preschool student in Tyler, Tex., receiving a flu shot this month. Google likes to think of itself as a learning machine, and the latest step to include C.D.C. data with its Flu Trends fits that mold. Credit Sarah A. Miller/The Tyler Morning Telegraph, via Associated Press

Google is giving its Flu Trends service an overhaul — “a brand new engine,” as it announced in a blog post on Friday.

The new thing is actually traditional data from the Centers for Disease Control and Prevention that is being integrated into the Google flu-tracking model. The goal is greater accuracy after the Google service had been criticized for consistently over-estimating flu outbreaks in recent years.

The main critique came in an analysis done by four quantitative social scientists, published earlier this year in an article in Science magazine, “The Parable of Google Flu: Traps in Big Data Analysis.” The researchers found that the most accurate flu predictor was a data mash-up that combined Google Flu Trends, which monitored flu-related search terms, with the official C.D.C. reports from doctors on influenza-like illness.

The Google Flu Trends team is heeding that advice. In the blog post, written by Christian Stefansen, a Google senior software engineer, wrote, “We’re launching a new Flu Trends model in the United States that — like many of the best performing methods in the literature — takes official CDC flu data into account as the flu season progresses.”

Google’s flu-tracking service has had its ups and downs. Its triumph came in 2009, when it gave an advance signal of the severity of the H1N1 outbreak, two weeks or so ahead of official statistics. In a 2009 article in Nature explaining how Google Flu Trends worked, the company’s researchers did, as the Friday post notes, say that the Google service was not intended to replace official flu surveillance methods and that it was susceptible to “false alerts” — anything that might prompt a surge in flu-related search queries.

Yet those caveats came a couple of pages into the Nature article. And Google Flu Trends became a symbol of the superiority of the new, big data approach — computer algorithms mining data trails for collective intelligence in real time. To enthusiasts, it seemed so superior to the antiquated method of collecting health data that involved doctors talking to patients, inspecting them and filing reports.

But Google’s flu service greatly overestimated the number of cases in the United States in the 2012-13 flu season — a well-known miss — and, according to the research published this year, has persistently overstated flu cases over the years. In the Science article, the social scientists called it “big data hubris.”

The lesson seems to be that in fields like public health and economics, where there are long-standing information-gathering systems, the smart move is to marry the new data with the old. The new breed of data from the web, cellphones and sensors can be a powerful, knowledge-enhancing asset. But it is a signal, not the signal.

David Lazer, a professor of political science and computer science at Northeastern University, and the lead author on the Science article, said it was “great” that Google was overhauling its flu-tracking engine to blend in the C.D.C. data. But Mr. Lazer said he and other researchers would like to examine how Google’s service works in detail including seeing at least a subset of the flu-related search terms it monitors — something Google has not yet disclosed.

In its blog post, Google said it would soon publish a technical paper on the changes in Flu Trends. But Google’s reluctance to disclose more is explained by its concern that Flu Trends runs through the corporate mother ship, the Google search engine. And Google is in a constant cat-and-mouse game with competitors, marketers, advertisers and search consultants trying to figure out the secrets of its search technology.

Mr. Lazer raises a good issue about the terms under which researchers might gain access to the increasingly valuable data sets amassed by private companies. But it’s also an issue that extends well beyond Google and Flu Trends. For example, The Economist recently ran a piece on the foundering efforts to get cellphone data that could be invaluable for tracking the Ebola epidemic.

Google likes to think of itself as a learning machine, and the latest step by Google Flu Trends fits that mold. It built something valuable and Google’s engineers are improving it as they learn more. Google Flu Trends now operates in 29 countries, a related Dengue Trends service is in 10 nations.