NSA's alleged phone-records program puts spotlight on data mining

High-end tools are needed to sift through info in search of patterns

The controversy over the National Security Agency's terrorism-related surveillance efforts, including its purported program for collecting domestic telephone data, is shining a spotlight on the esoteric arena of high-end data mining.

One IT vendor that has been publicly linked to the NSA is Narus Inc., a Mountain View, Calif.-based company that sells systems for intercepting and analyzing telecommunications and network traffic. In an affidavit (download PDF) submitted in April as part of a lawsuit filed against AT&T Inc. by the Electronic Frontier Foundation (EFF), Mark Klein, a retired AT&T communications technician, said that in 2004, he saw a document listing Narus' technology among the equipment installed in a "secret room" at an AT&T central-office facility in San Francisco -- allegedly at the direction of an NSA agent.

The EFF filed a class-action lawsuit against AT&T in U.S. District Court in San Francisco on Jan. 31, claiming that the telecommunications carrier is violating federal law by letting the NSA wiretap its customers without warrants.

Steven Bannerman, vice president of marketing at Narus, declined to confirm or deny that his company is involved with the NSA and AT&T. But he readily acknowledged that its technology has the ability to sift through large amounts of network data in search of targeted information.

Narus' traffic processing engine can inspect data at speeds of up to 10Gbit/sec. while performing deep inspections of the content of network packets, including telephone calls, e-mail text and streaming video, Bannerman said. He claimed that the technology enables network operators to spot viruses and identify human targets, such as spammers or potential terrorists.

The equipment comes with optional lawful-intercept features designed to help ensure that only network packets presumed to originate from a court-approved target are tracked, and only for as long as a warrant is issued. But, Bannerman noted, "once we sell the product to customers, there's no mechanism in the software to check whether or not they are using the warrant management system."

The device that collects the packets is paired with an Intel-based "logic" server that runs Red Hat Linux and analyzes packets in real-time for preconfigured targets such as IP addresses or "voice prints," he said. It also can check for anomalous patterns.

Determining what patterns to scan for is done separately, typically by using data mining and business intelligence tools to analyze information stored in a data warehouse.

Stephen Brobst, chief technology officer at Teradata in Dayton, Ohio, declined to comment on whether the NSA is using the NCR Corp. division's data warehousing software. But he acknowledged that Teradata's technology is popular with telecom carriers and network services providers for storing and analyzing the massive volumes of call data records and network traffic information they collect.

For instance, Brobst said that AT&T's Daytona data warehouse, which it built in-house partially using Teradata technology, stores 1.88 trillion call records that amount to more than 312TB of data.

Richard Winter, president of Winter Corp., a Waltham, Mass.-based consulting firm that produces an annual report on the largest databases in use, said data warehouses usually require five times the storage capacity that's needed for the data alone.

The RAID technology that's designed to back up and protect data takes up extra space, Winter noted. Moreover, although the amount of data that disks can contain per spindle doubles every year, the rates at which the disks spin and at which the arms that hold the read-write heads move haven't changed much, according to Winter. "The result of that is, to get good performance on a normal data warehouse, you have to leave the disks partly empty," he said.

Some analysts argue that social network analysis, the data mining technique used most often to determine interconnections between people, isn't particularly effective with call data records alone.

"If the only data you have is what phone number calls what number and how long they talk, trying to figure out who is a terrorist through this ‘top-down approach' is impossible," said Valdis Krebs, a Cleveland-based consultant who has done work for many defense and federal government IT contractors.

But Brobst said that social network analysis has long been used by telephone companies to do sophisticated calculations for purposes such as figuring out how to best structure their friends-and-family calling plans to appeal to customers and maximize their profits. "The whole point is that you don't know exactly what you're looking for, so you use data mining to search for patterns," Brobst said. "Going the other direction is easy."

Not surprisingly, the NSA isn't talking about its data collection and mining activities. "Given the nature of the work we do, it would be irresponsible to comment on actual or alleged operational issues; therefore, we have no information to provide," NSA spokesman Don Weber said via e-mail. "However, it is important to note that NSA takes its legal responsibilities seriously and operates within the law."

Copyright © 2006 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon