Event Correlation

In today's interconnected world, network management is critically important. Those who maintain the network need to quickly pinpoint and fix any problem, whether it's a malfunctioning mail daemon or a damaged fiber-optic link.

Luckily, almost every part of a modern network provides data about what it's doing:

• Operating systems log systems and security events.

• Servers keep records of what they do.

• Applications log errors, warnings and failures.

• Firewalls and virtual private network gateways record traffic deemed suspicious.

• Network routers and switches watch what goes on between network segments.

• Messaging systems forward alerts, such as Simple Network Management Protocol (SNMP) traps, to a central management console.

Besides monitoring their own behavior, all these devices and management programs receive and relay messages from other network systems, leading to duplicate alerts. A single failure or problem can generate a blizzard of event messages.



The more complex the network and the more applications that are distributed, the more event messages, alarms and alerts the appliances will generate. In the end, far more data is generated than anyone can easily scan, and it's all over the place.

In 2000, Chris Jordan, a security manager at Computer Sciences Corp., wrote in a posting to the SecurityFocus Web site, "OC-12 connections can generate about 850 megabytes of event data in an hour." (OC-12 is a fiber-optic connection with bandwidth of 622Mbit/sec.) That translates into more than 600GB of data per month, or 7TB a year -- just for logs and alerts related to a single network link.

"IT managers spend 60% to 90% of their time resolving problems just with simple diagnostics," says Dennis Drogseth, vice president of Enterprise Management Associates Inc., an analyst and market research consultancy in Portsmouth, N.H.

Event correlation simplifies and speeds the monitoring of network events by consolidating alerts and error logs into a short, easy-to-understand package. A network administrator can deal with, say, 25 events based on cross-referencing intrusion alerts against firewall entries and host/asset databases much more efficiently than when he must scan 10,000 mostly normal log entries.

The benefits can be very real: more efficient use of staff time and skills, as well as the prevention of revenue loss resulting from downtime.

According to Marcus Ranum, an independent computer and communications security consultant in Woodbine, Md., "Correlation is something everyone wants, but nobody even knows what it is. It's like liberty or free beer -- everyone thinks it's a great idea and we should all have it, but there's no road map for getting from here to there." Still, a variety of technologies and operations are associated with event correlation:

Compression takes multiple occurrences of the same event, examines them for duplicate information, removes redundancies and reports them as a single event. So 1,000 "route failed" alerts become a single alert that says "route failed 1,000 times."

Counting reports a specified number of similar events as one. This differs from compression in that it doesn't just tally the same event and that there's a threshold to trigger a report.

Suppression associates priorities with alarms and lets the system suppress an alarm for a lower-priority event if a higher-priority event has occurred.

Generalization associates alarms with some higher-level events, which are what's reported. This can be useful for correlating events involving multiple ports on the same switch or router in the event that it fails. You don't need to see each specific failure if you can determine that the entire unit has problems.

Time-based correlation can be helpful establishing causality -- for instance, tracing a connectivity problem to a failed piece of hardware. Often more information can be gleaned by correlating events that have specific time-based relationships. Some problems can be determined only through such temporal correlation. Examples of time-based relationships include the following:

• Event A is followed by Event B.

• This is the first Event A since the recent Event B.

• Event A follows Event B within two minutes.

• Event A wasn't observed within Interval I.

Winning Users Over

"Event correlation, in its basic form, is becoming almost a commodity product," says Drogseth. "Where you want to reduce the number of events and alarms and have some level of topological awareness to eliminate duplicates -- that's pretty standard and working today." Buyers are skeptical, but Drogseth says many event-correlation products work well out of the box or with minimal customization.

"There are any number of more sophisticated approaches that are all about diagnostics, finding out what is the real cause of a problem," Drogseth says. "Here, you have to address a lot more complexity in network infrastructure." When you start trying to isolate a problem and get at the true root cause, he says, "you have a high level of investment and complexity, but also a high level of value."

Kay is a Computerworld contributing writer in Worcester, Mass. Contact him at russkay@charter.net.

See additional Computerworld QuickStudies


Copyright © 2003 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon