Microsoft analyzes web searches, finds clues for early cancer detection

Analyzing web queries could provide early pancreatic cancer detection five months before diagnosis

healthcare data thinkstock
Credit: Thinkstock

Analyzing online activities can provide clues as to a person's chances of having cancer, Microsoft researchers showed in a paper published this week.

Specifically, the researchers demonstrated that by analyzing web query logs they were able to identify internet users who had pancreatic cancer even before they'd been diagnosed. The research is part of a larger trend where data analytics is being used to improve healthcare.

The study suggest that "low-cost, high-coverage surveillance systems" can be created to passively observe search behavior and to provide early warning for pancreatic cancer, and with extension of the methodology, for other challenging cancers," the researchers concluded. "Surveillance systems could also provide for automated capture and summarization of data and landmarks over time so as to provide patients with talking points in their discussion with medical professionals."

The researchers used proprietary logs of 9.2 million web queries on Microsoft's own Bing search engine but focused exclusively on English-speaking people in the U.S.  from October 2013 to May 2015. They tracked the characteristics of users' search and click activities to capture intentions, which provided data to construct a statistical model.

healthcare data cloud THinkstock

The study team, made up of Microsoft researchers Dr. Eric Horvitz and Dr. Ryen White  and Columbia University graduate student John Paparrizos, said they anonymized the data, but gave each search an identifier linked to the Web browser. That enabled the extraction of search log histories.

First, the team identified searchers in logs of online search activity who made "special queries" that are suggestive of a recent diagnosis of pancreatic cancer. Those queries included phrases such as "Why did I get cancer in pancreas," and "I was told I have pancreatic cancer, what to expect."

The researchers were also able to use special Bing-created filters to weed out queries from users when fewer than 20% were health related, assuming that the searchers were being performed by healthcare professionals. That left 7.2 million web queries to examine.

The team then went back "many months" before the initial queries were made to examine patterns of symptoms as they were expressed by web searches about pancreatic cancer symptoms.

"We showed specifically that we can identify 5% to 15% of cases, while preserving extremely low false-positive rates," the researchers said in their paper. The false positives ranged from one in 10,000 to one in 100,000.

pancreatic cancer web logs Microsoft

Microsoft researchers showed that they can identify 5% to 15% of pancreatic diagnosed users based on their previous search history.

Unlike many other cancers, which can be slow growing, pancreatic cancer is among the most aggressive, meaning early diagnosis can lead to better outcomes.

Additionally, early signs and symptoms of pancreatic cancer are subtle and often present as nonspecific symptoms that appear and evolve over time, the researchers noted.

The results of analyzing web queries pointed to early cancer detection -- as much as five months before a physician's diagnosis.

"Web search logs may offer a useful source of signals for pancreatic [cancer] screening, with significant lead time," the researchers said. "Because pancreatic [cancer] may progress from stage I to stage IV in just over 1 year, this screening capability could increase 5-year survival."

The march toward exascale computers
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies