November 6, 2003
(Computerworld)
The term relational database is almost superfluous these days. After all, every major commercial database productOracle, Sybase, DB2is based on the same underlying relational model. There are many good reasons for the dominance of the relational model over older models, such as hierarchical and network models, as well as more recent models such as object databases: The underlying theory on which they are based allows relational databases to elegantly represent complex data sets and perform flexible and arbitrary queries. Formalized methodologies to schema design can go a long way toward eliminating data repetition and inconsistencies by splitting information into multiple tables with narrowly focused uses.
Given the ubiquity and obvious strengths of relational databases, it's understandable that application developers reflexively gravitate toward them whenever serious data storage is required. Security application vendors are no exception; a number of them have built and released products that attempt to store millions of firewall, VPN and intrusion-detection system records, in addition to server and application logs, in one of these relational database products.
So after making substantial investments in hardware and software licenses, customers are all too often dismayed to find that their security event analysis application fails to manage the large volumes of security-related data needed for proper incident investigation or to meet regulatory compliance.
The most immediate issue in log data management is finding that the events can't be inserted into the database as fast as they are generated. A number of factors contribute to the database insertion bottleneck, including index construction, commit and rollback space, queries, data deletion, and database management and maintenance.
Index issues
Queries against relational databases perform best when they access the data via an index. The natural tendency is for developers to try to optimize performance by constructing an index to handle many, if not most, of the anticipated user queries.
In log management, however, this strategy backfires because the amount of data being inserted is far greater than the amount of queries against the data.
It's also important to note that the efficiency of an index will also degrade over time, as records are added and deleted from the underlying table. A routine task in system administration is the periodic rebuilding of indices to offset this problem. In high-volume log management applications, this period may be surprisingly short, often as little as one week. Not surprisingly, the rebuilding of indices against a table containing many millions of records will result in the underlying table being effectively unusable for both insertion and querying for many hours.
Commit and rollback issues
The transactional nature of relational databases is indispensable in many applications. The textbook example of a financial transfer illustrates this requirement: Deducting an amount from one account and adding the amount to a different account involves two linked transactions. Ideally, both transactions should succeed, but if one should fail, the other must also. It is unacceptable for one to occur without the other.
In the context of log management, however, no such dependencies exist. The inability to load a log record into the system does not invalidate any other log record loaded along with it. Security applications that feed into relational databases may choose to minimize the number of transactions by loading fewer batches of many records. This causes a large amount of rollback space to be consumed, as well as an expensive recovery should one record fail in the large batch. Loading many small batches of fewer records has a much smaller recovery overhead but generally results in lower throughput as the increased number of transactions slows the overall process.
A combination of filtering the amount of data sent to the database and buying sufficiently powerful hardware to handle the insertion of the reduced data raises other issues, including the following:
![]() | |
| Kevin Hanrahan is director of Security Strategy at San Francisco-based Addamark Technologies, a provider of information security solutions that enable rapid detection and investigation of damaging attacks, particularly insider abuse, long-term attacks and other suspicious activity inside the firewall. |