Google Flu Trends spreads privacy concern

Privacy groups question lack of transparency with method used to predict outbreaks

Google's new Flu Trends tool, which collects and analyzes search queries to predict flu outbreaks around the country, is raising concern with privacy groups.

The Electronic Privacy Information Center filed a Freedom of Information Act request asking federal officials to disclose how much user search data the company has recently transmitted to the Centers for Disease Control and Prevention, or CDC, as part of its Google Flu Trends effort.

Concern stems from what privacy groups claim is a disturbing lack of transparency surrounding the method Google is using to predict flu outbreaks. Google has publicly stated that all the data used is made anonymous and is aggregated, but there has been no independent verification of how search queries are used and transformed into data for Google Flu Trends, said the privacy groups.

"What we are basically saying is that if Google has found a way to ensure that aggregate search data cannot be used to re-identify the people who provided the search information, they should be transparent about that technique," said Marc Rotenberg, Electronic Privacy Information Center's president.

Rotenberg said the issue is important because the same techniques Google is using to predict flu outbreaks could be applied to tracking other serious diseases, such as SARS. "Let's say we have a spike in Detroit of SARS and the police say we want to know who in Detroit submitted those searches. How can Google ensure that this can't be done? The burden is on Google," Rotenberg said.

Publicly disclosed in November, Google Flu Trends has been described by the company as a Web tool to help individuals and health care professionals obtain influenza-related activity estimates for all U.S. states, up to two weeks faster than traditional government disease surveillance systems.

Google said in a blog post introducing Flu Trends last month that search queries such as "flu symptoms" tend to be very common during flu season each year. A comparison of the number of such queries with the actual number of people reporting flu-like symptoms shows a very close relationship, it said. As a result, tallying each day's flu-related searches in a particular geography allows the company to estimate how many people have a flu-like illness in that region.

Google also noted that it had shared results from Flu Trends with the epidemiology and prevention branch of the influenza division at the CDC during the last flu season and noticed a strong correlation between its own estimates and the CDC's surveillance data based on actual reported cases. Google said that by making flu estimates available each day, Google Flu Trends could provide epidemiologists with an early-warning system for flu outbreaks.

Rotenberg said the service was potentially useful, but much depended on the kind of search data that Google is collecting and analyzing to make its predictions. Google has said that the database it uses for Flu Trends retains no identity information, IP addresses or any physical user locations. However, what is not clear is whether the company is completely deleting IP addresses, and if so, when it is doing it. Also, he said another issue was whether all Google is doing is anonymizing IP addresses by redacting some of the numbers in an IP string.

Google also claims that as part of its overall privacy policy it anonymizes all IP addresses associated with searches after nine months. Yet in an apparent contradiction, when introducing Flu Trends, Google noted that it uses both current and historic search data -- dating back to 2003 -- to make its predictions, Rotenberg said.

Jeffery Chester, executive director of the Center for Digital Democracy, said Google's growing presence in the health care space also makes it important for the company to disclose what kind of data it is collecting and using for Flu Trends.

"Google sees a potential profit center from targeting its vast user base with advertising that is related to health issues," Chester said. The company's announcement of Flu Trends in fact shows to pharmaceutical and medical markets precisely the kind of sophisticated analysis the company can do with search data to enable highly targeted medical marketing, he said. "This is about taking the tracking data that Google has at its disposal and focusing it on generating a new profit center for the company," Chester said.

Pam Dixon, executive director of the World Privacy Forum, echoed similar concerns and questioned whether the anonymization techniques used by Google provided enough of a guarantee that a search term could not be traced back to specific individuals. She pointed to an incident two years ago where AOL inadvertently posted search information on a public Web site. The search queries had supposedly been anonymized by AOL, but it was still relatively easy to track specific search terms back to IP addresses and even individuals in many cases, Dixon said.

Mike Yang, senior product counsel at Google, downplayed privacy concerns related to Flu Trends and insisted that the tool uses no personally identifiable data.

"Flu Trends uses aggregated data from hundreds of millions of searches over time," Yang said today in an e-mail. "Flu Trends uses aggregations of search query data which contain no information that can identify users personally. We also never reveal how many users are searching for particular queries."

Yang noted that the data used in Flu Trends comes from Google's standard search logs. He also referenced an article in the journal Nature, authored by the Google Flu Trends team, which he said explains the methodology behind the tool.

FREE Computerworld Insider Guide: IT Certification Study Tips
Join the discussion
Be the first to comment on this article. Our Commenting Policies