Encrypted Web traffic can reveal highly sensitive information

Merely knowing what pages a person views on a website can hint at their personal life

Analyzing encrypted Web traffic can potentially reveal highly sensitive information such as medical conditions and sexual orientation, according to a research paper that forecasts how privacy on the Internet may erode.

In a paper titled "I Know Why You Went to the Clinic," researchers show that by observing encrypted Web traffic and identifying patterns, it is possible to know what pages a person has visited on a website, giving clues to their personal life. The paper will be presented July 16 at the Privacy-Enhancing Technology Forum in Amsterdam.

Almost all websites that exchange sensitive data rely on SSL/TLS (Secure Sockets Layer/Transport Security Layer) technology, which encrypts data exchanged between a person's computer and a server.

The data is unreadable, but the researchers developed a traffic analysis attack that makes it possible to identify what individual pages in a website a person has browsed with about 80 percent accuracy. Previous research had shown it was possible to do such analysis, but the accuracy rate was 60 percent.

They evaluated the effectiveness of the attack using 6,000 web pages within 10 websites: the Mayo Clinic, Planned Parenthood, Kaiser Permanente, Wells Fargo, Bank of America, Vanguard, the ACLU, Legal Zoom, Netflix and YouTube.

Studying encrypted page views of health care websites, for example, "have the potential to reveal whether a pending procedure is an appendectomy or an abortion, or whether a chronic medication is for diabetes or HIV/AIDS," they wrote.

"These types of distinctions and others can form the basis for discrimination or persecution and represent an easy opportunity to target advertising for products which consumers are highly motivated to purchase," according to the paper.

In order to execute a traffic analysis attack, an adversary would have to be able to identify the encrypted traffic patterns of a particular site as well as be able to observe the victim's Web traffic. ISPs and employers would have visibility on users' data streams, they wrote.

One way to thwart such analysis is a "burst" defense, which involves modifying packet sizes in an attempt to make traffic less vulnerable to pattern recognition, they wrote.

A "linear" defense pads packet sizes up to multiples of 128, while an "exponential" defense pads the packet sizes up to powers of two. Another approach is to randomly fragment packets, which offers the advantage of not generating additional data.

"The Burst defense offers greater protection, operating between the TCP layer and application layer to pad contiguous bursts of traffic up to pre-defined thresholds uniquely determined for each website," the paper said. "The Burst defense allows for a natural tradeoff between performance and cost, as fewer thresholds will result in greater privacy but at the expense of increased padding."

There are still complications that hamper pattern identification in encrypted web traffic. For example, different operating systems, devices and locations of devices could make the Web traffic appear more diverse and harder to identify.

The research also assumes that a person is browsing the Web through a single tab in their web browser. It was unclear to the researchers how much traffic might be generated by other open tabs and if it could be separated.

Those conditions would also impact how to defend against an attack, as "realistic conditions may substantially contribute to an effective defense," they wrote.

The paper was co-authored by Brad Miller, A.D. Joseph and J.D. Tygar of the University of California at Berkeley and Ling Huang of Intel Labs.

Send news tips and comments to jeremy_kirk@idg.com. Follow me on Twitter: @jeremy_kirk

FREE Computerworld Insider Guide: Five IT certifications that won’t break you
Join the discussion
Be the first to comment on this article. Our Commenting Policies