November 7, 2005
(Computerworld)
In a previous column, I outlined the five steps in the problem management process: detection, identification, determination, resolution and reflection . I explained how new technologies will be required to help IT administrators determine the root causes of IT problems.
But how do IT administrators determine them today? One of the most critical steps is to go through the vast ocean of log data generated by the IT infrastructure, including router, switch, firewall, server, Web server and application logs. The logs contain a wealth of information, such as debugging or error data, that's not available anywhere else.
Searching through these logs is usually the most effective way for IT administrators to determine the root cause of a problem. The way it's done traditionally, however, using Grep or other Unix tools, is extremely inefficient for several reasons.
First of all, the Unix tools aren't made to perform extremely fast searches through logs. They're designed to search through files line by line, starting from the top, until some entry matches the search requirement. This can be an excruciating process because the matching entries could be buried at the end of a huge log file that may be tens of gigabytes.
Second, these tools have no concept of time. Almost all logs have time stamps and can tell IT administrators what happened at that specific moment in time. However, the Unix tools don't recognize this time stamp. Imagine that you want to search for only a five-minute block near the end of the day, and the log file is a day's worth of data. The Unix tools will start searching from the start of the file and will take a very long time to get to the desired five-minute block.
Third, there is no effective way to express sophisticated search commands with the tools. For example, to search for logs that have either words login or logon, but not root or zhenjl, the IT administrator would have to connect a set of sophisticated Unix commands together.
Finally, when the IT administrator finds the desired log entry, there's no efficient way to drill down to find out what came before or after that entry. It's extremely important to do that, since the root cause usually comes before the actual problem.
So, how can IT administrators make root-cause analysis more efficient?
The answer lies in the hottest technology in the IT world today: full-text indexed Boolean search. This technology is used by all search engines, including Google, Yahoo and MSN, to quickly return the desired results to users in seconds. By applying full-text indexing to all log data, IT administrators can reap the same benefit.
Indexing log data means breaking each of the log entries into tokens, or words. The location of each token is stored in a special dictionary called an index file. The creation of this index file doesn't affect the integrity of the original logs, which is a critical requirement for regulatory compliance. Special care also has to be applied to ensure that all time stamps of tokens are remembered so searches can be optimized.
Once this index file is created, IT administrators can then instantly locate the desired logs through Boolean searches. Such a search can consist of a single word or multiple words. These words can be connected using special Boolean operators such as AND, OR, NOT and parentheses. Wild cards such as "*" can also be applied to these words.
For example, to locate logs that match the earlier criteria, the IT administrator would simply create a Boolean search expression like "(login OR logon) AND NOT (root OR zhenjl)." IT admins can further restrict the result set by simply adding new conditions using Boolean operators. By adding a time frame to the search, the log search engine will also jump to the desired time without searching through all the unnecessary logs.
Unlike the Web search engines, where index files are updated only periodically, the log search engines can index data in real time, usually tens of thousands of log messages per second. This real-time component gives IT administrators the fastest way to locate what they need.
Search technology has been applied in many ways, including Web search and desktop search. Applying it to the vast ocean of infrastructure log data will no doubt make root-cause and forensic analysis much easier for IT administrators.
Jian Zhen, CISM, CISSP, is a freelance writer in the San Francisco Bay area. He has been in the information security industry for nine years.
He can be reached at zhenjl@gmail.com or www.crypt0.net/blog.