Open-source spam-blocker gets high marks at Cornell

Cornell's CIO said the antispam tool is 99% effective in blocking unwanted e-mail

When the academic year begins this fall, students at Cornell University's Johnson Graduate School of Management will be armed with what its CIO sees as a powerful new weapon to battle spam.

For the past two months, the school's IT organization has been beta-testing an open-source tool called the SpamBayes Outlook Plug-in and is preparing for a broad rollout.

The SpamBayes tool blocks spam using a unique form of statistical analysis that's far more efficient and customizable than any commercially available antispam product, according to Larry Fresinski, the school's CIO.

"It's been extraordinarily effective," he said. "It catches 99% of my spam." Fresinski said he has contacted 20 other business schools to inform them about the technology.

The university has been testing the SpamBayes Outlook Plug-in with Microsoft Corp.'s Outlook XP, Outlook 2003 Beta and an Exchange 2000 server. Cornell's management school is a beta tester of Outlook 2003, which, like other e-mail products, comes with its own antispam technology. As a tester of SpamBayes, the Ithaca, N.Y.-based school has recommended the approach to Microsoft, Fresinski said.

SpamBayes is the name of an open-source project working to develop an antispam filter based on Bayesian theory, a method of statistical analysis.

The approach is different from traditional antispam technologies that use predefined rules to look for specific features or words in mail headers and body text to identify unsolicited mail. Many of these technologies also use blacklists to block mail from certain addresses.

The problem with such approaches is that they rely on a predefined and general description of spam and not on a user-specific definition of the term, Fresinski said.

SpamBayes first analyzes a user's legitimate e-mail and spam mail for clues as to what makes each different. It then applies those clues to the headers, content and style of incoming messages to determine whether they are spam.

The greater the number of initial samples and the broader the variety, the more quickly Bayesian filters can be "trained" to recognize spam, said Brian Burton, president of Burton Computer Corp., a consultancy in LaVale, Md. The company has developed an open-source tool called SpamProbe, which uses similar techniques to block spam.

"That is one of the weaknesses of this approach," Burton said. "You've got to get it to a point where it can start making the right decisions."

Although SpamBayes won't prevent Cornell's mail servers from getting spammed, it will allow end users to weed out spam more effectively, Fresinski said. So far, there hasn't been one instance in which the software has stopped legitimate mail from getting through or failed to stop spam, he said.

"It's open-source software. It's free," Fresinski said. "The beauty of it is that it continually learns what is spam to you, and not [to] some external database."

How It Works


It first analyzes samplings of "good" e-mail and spam.

It builds a database of clues from these samples to ascertain what differentiates the two.

It uses these clues to examine new messages and calculate the probability that the messages are spam.

Copyright © 2003 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon