8 ways to fight spam filter frustration

False positives are the scourge of spam filters. Whether you're sending or receiving, here are some steps you can take to keep good e-mail out of the slush pile.

Spam. It fills our in-boxes, wastes our time and spreads malware -- and it's only getting worse. According to Ferris Research, which studies messaging and content control, 40 trillion spam messages are expected to be sent in 2008, costing businesses more than $140 billion worldwide -- a significant increase from the 18 trillion spam messages sent in 2006 and the 30 trillion in 2007.

In theory, e-mail filtering software and appliances allow "good" or "true" e-mail messages to pass through while prohibiting spam. But the filters can err in either of two ways: They can mistakenly allow spam to pass through, believing it to be true e-mail (known as a "false negative" situation), or they can mistakenly block true e-mail, believing it to be spam (a "false positive").

Typically, after identifying a message as spam, the filtering software either blocks it outright or places it in a quarantine folder, allowing the recipient to review it later. Although the latter method provides a chance to retrieve false positives, it requires time and effort from the user -- and some users never bother to check their quarantine folders at all.

Users and organizations that receive spam incur a cost in deleting it -- about $.04 per message, according to Ferris Research. But Ferris analyst Richi Jennings points out that the cost to locate missing true e-mail is far greater than that of deleting spam -- about $3.50 per message.

(Ferris developed these figures using published data on such factors as labor size and hourly labor costs, then applied its own estimates, such as the percentage of workforces having e-mail access and volumes of spam messages. A downloadable spreadsheet [registration required] illustrates Ferris' model.)

Even worse, Jennings says, organizations incur potentially greater costs through missed opportunities because of false positives that they never see -- for example, a consulting firm that fails to receive a request for proposal.

Filtering techniques

To minimize the false positives caused by spam filters, it helps to know a bit about how they work. To keep up with ever more sophisticated spam, filters have used a variety of techniques over the years, often used in combination with one another. Here is a bird's-eye view of some popular techniques, in rough chronological order:

Keyword-based and Bayesian filters

The earliest filters searched a subject line and message body for particular words, such as "Viagra" or "online pharmacy." More sophisticated versions employ Bayesian analyses, which combine keyword searches with techniques such as determining ratios of "good" to "bad" words and assigning probability scores based on these ratios.

Challenge response

Unrecognized senders receive a reply asking them to validate themselves by supplying letters and characters that appear in images onscreen, a technique also known as CAPTCHA (completely automated public Turing test to tell computers and humans apart). This test is based on the idea that humans can detect and input certain patterns, while computers are unable to do so. Once a sender has been validated, his e-mail messages are sent straight through without the challenge step.

Blacklisting, whitelisting and reputation listing

With these techniques, the filter evaluates not the message, but the characteristics of the sender, in particular the sender's previous record concerning spam.

  • Blacklists are databases that collect the IP addresses of known spammers from around the world. The spam filter checks incoming messages against the blacklist and refuses to accept e-mail from these addresses. Depending on the specific spam filter product used, the blacklist it checks could be local (i.e., maintained at a company's own network), remote (maintained centrally, independent of a specific company) or a combination. Some centrally maintained blacklists are publicly available, while others are fee-based services.
  • Whitelists collect the IP addresses of trusted e-mail sources on a "good sender" list, and the filter automatically accepts e-mail from those addresses. As with blacklists, a spam filter product could check a local whitelist, a centrally maintained one or both. Many spam filters make use of both blacklists and whitelists.
  • The term reputation service (or reputation list) is sometimes used to refer to a technique that makes use of blacklists and whitelists but broadens them by considering not only the sending IP address, but the entire domain. However, the terminology is used inconsistently in the industry, with the terms "blacklist" and "reputation list" often used interchangeably.
  • In some cases, vendors use "reputation service" or "reputation list" to differentiate their lists from the community heritage of blacklists and whitelists. But Jennings cautions against buying into the idea that reputation lists are all run professionally, while all blacklists are "cesspools of false positives." He says he has yet to see a reputation list that's truly different from traditional blacklists or whitelists.
1 2 3 Page 1
Page 1 of 3
  
Shop Tech Products at Amazon