Review: Graylog delivers open-source log management for the dedicated do-it-yourselfer

error logs monitor

In most big security breaches, there’s a familiar thread: something funny was going on, but no one noticed. The information was in the logs, but no one was looking for it. Logs from the hundreds or thousands of network devices are the secret sauce to problem solving, security alerting and performance and capacity management. Gathering logs together, analyzing them, reporting and alerting on them is a basic part of good IT practice.

Graylog is an open-source log management tool, complete with a three-tier architecture, super-scalable storage (based on Elasticsearch), an easy-to-use Web interface and a powerful toolkit to parse messages, build ad-hoc dashboards and set alerts on logs. It sounds great, and our testing shows that the functionality provided is solid and reliable, with one caveat: You have to be willing to do a lot of work yourself.

If you regard writing regular expressions as a fun afternoon’s work, if you have a fairly limited set of homogeneous network devices and log sources that you understand pretty well and if you don’t need reporting or correlation, then Graylog is a great choice at an excellent price. However, Graylog has significant limitations compared to commercial products. The money you save in not paying for a commercial log management tool (such as Splunk), you may eat up in your own time investment to customize and adapt Graylog to your environment.

Easy to install and get started

We started by downloading the pre-built VM based on Ubuntu v14 (“Trusty”) provided by the Graylog team. Graylog has worked hard to make installation easy. Our testing focused on Graylog for network and security monitoring, which means that most of the log messages we had to feed it came either via SYSLOG or were in the Windows Event Log. Since we had an existing SYSLOG receiver on a dedicated IP address, we were able to swap Graylog in quickly, and we were up and receiving SYSLOG messages within an hour.

The pre-built VMs are nearly suitable for production deployment, and even support scalability features (such as separating out front-end from back-end services). The Graylog VM does not include a system management control panel -- it’s not an appliance, but a handily pre-installed system -- so maintaining the underlying operating system is your command line responsibility. We ran Graylog on our production VMware cluster for several months, and went through two upgrades of the software, all without problems or data loss.

If you need more control, you also have the option to install Graylog’s components yourself onto popular Linux distributions (Ubuntu, Debian and CentOS) or use common DevOps orchestration tools (Puppet and Chef are supported, as well as Ansible and Vagrant) to automate putting Graylog on your own supported Linux platform. Graylog does not run on Windows.

Although the Unix command line is needed for system management, most Graylog operations are handled through the web-based GUI. Graylog’s GUI has a modern feel, and we found the design of the GUI intuitive and easy-to-learn. The Graylog team has taken care to make navigation easy and makes full use of the power of the browser — without requiring add-ons such as Java or Adobe Flash.

+ ALSO ON NETWORK WORLD 7 communities driving open source development +

Evaluating Graylog: Starting with data collection

To evaluate Graylog, we considered the entire cycle of log management and came up with six main areas to test: data collection and storage, parsing and normalization, searching, reporting, correlation and analysis of data, and alerting. Although Graylog has the potential for high-performance scalability through its choice of database and its multi-tier architecture, we did not test for performance.

Before any log management tool can be useful, you have to get your logs into it. Graylog has the obligatory SYSLOG receiver (both TCP and UDP) as well as an agent-based Windows Event Log collector that we focused on in our testing. The Graylog team has defined their own structured logging format called “GELF” (Graylog Extended Log Format), which it offers as a way for applications (and other log collection tools) to pre-parse their log data before sending it off to Graylog. Graylog also includes an HTTP-based interface for direct submission of log data.

Graylog has a particularly strong set of log collection tools for the development community, with documented support for pulling logs from Ruby on Rails, Heroku (an application deployment environment), and using JSON. From our point of view as network and security managers, these were not particularly interesting, but Graylog is not aimed only at network and security log management — as a general tool, it can also meet the needs of operations teams looking to monitor complex interconnected application environments.

For other common formats, such as logs being stored in databases, flat text files, non-agent-based Windows, or anything else you might want, Graylog relies on the open source world. For example, if you don’t want to install a Graylog agent on all of your Windows servers, you’ve got to build an agentless solution yourself. To facilitate the linkage between contributed software and the core product, the company has created a “Graylog Marketplace”. That’s great if you find what you want, but if you don’t, then you have to write it, adapt something else, or assemble parts yourself.

For example, in our testing, we needed to link Graylog to our production Sophos Enterprise Console to monitor anti-malware events and other issues. Sophos writes to a database, but Graylog can’t read from that. So we ended up installing an additional product from Sophos to extract messages and write them to text files, bringing up a SMB file share to get the files to the Graylog world, and then using another open source tool, Logstash, to translate the Sophos format to GELF format and send it over to Graylog. Did it work? Yes. Did we have to learn multiple tools and build our own pattern recognizers to make it all work? Yes — a lot of tools and a lot of work.

Of course, watching what other people in the community have done with Graylog can give you new ideas of ways to collect and monitor data. For example, one of the contributed tools does HTTP polling and collects statistics, turning these polls into log messages, which can then be displayed or analyzed using Graylog’s dashboard tools, making Graylog even more useful.

As with most open source products, if the path you are following is one that multiple people have followed before, life is easy with Graylog. You’ll find reliable tools, good documentation, and some support. But if you want to collect logs in many different ways, Graylog gives you the framework to solve your problem, but leaves the details — and the long-term support — up to you.

Parsing and normalization

One of the most important features of modern log management tools is the ability to parse and normalize log messages before storing them. Without good parsing, the log management tool has little benefit, because the log messages have no real meaning. If you can’t tell, for example, source IP addresses from destination IP addresses, or network port numbers from disk error counts, then you haven’t moved out of the 1990s in log management.

Graylog has two main ways of handling the parsing problem. The first, and obviously their favorite, is GELF, their own structured log format. When messages are sent in using GELF format, the data are pre-parsed, and Graylog can store them in its databases quickly and with high accuracy. Unfortunately, Graylog doesn’t support CEF (the almost-identical structured log format pushed by ArcSight nearly 10 years ago, adopted by many network and security vendors), so enterprises that normalize their logs into CEF format hoping for a drop-in replacement are out of luck.

The second way to handle log parsing is to do it yourself as the messages enter Graylog, using what Graylog calls “Extractors.” These are essentially regular expressions you write (or download from the Graylog Marketplace if your device is supported) that parse out messages. While these extractors are not hard to write, Graylog’s facilities for controlling the extractors need work. For example, you can’t have different extractors called into play depending on who is sending the message, which means that — for all practical purposes — each different type of device in your network has to send to a different Graylog input process (either on a different IP address or a different port number). This adds to the complexity of deployment.

Overall, Graylog’s parsing is very generic, and this results in one of the main weaknesses of Graylog: the lack of any sort of data dictionary or information schema. When deploying Graylog, the network manager is thrust into the role of developing an information framework for reading their own log data and making use of it. Although the IT industry has decades of experience with this, and over a dozen SEIM products have hit the marketplace, Graylog doesn’t include any of that accumulated knowledge. The network manager has to decide everything from which fields to capture, to what names to use, to where DNS lookups should occur, to which messages are important and which ones are not.

Some network managers may find this a fascinating exercise and dive deep in the time-consuming task of trying to understand and categorize every element of every log they get from every device. But for many others, this is less attractive. Part of the value of a log management tool is some semantic knowledge of what logs mean and how they should be interpreted. Graylog doesn’t provide that, and the Graylog Marketplace doesn’t help — as there is no effort made to keep the various extractors and GELF tools in synchronization with each other.

Normalization is another area where Graylog leaves you almost entirely on your own. The one type of normalization built-in is date formats, where Graylog has a well-designed converter that helps to parse date formats between different log systems into a single consistent timestamp.

But for all other normalization, Graylog depends on another open source tool, Drools, a business rules management system. Graylog borrows the Drools Expert business rules engine, which can be used to further parse and normalize messages.

Has Graylog brought together the pieces that are needed to build a good parsing and normalization system? Yes, definitely. Has your typical network manager been cast adrift in a time-consuming sea of confusion compared to other log management tools? Equally true. Graylog brings a completely blank slate to the table. From our point of view, too blank. There must be a way to build the accumulated best practices of network and security log management into the tool without turning this into a do-it-yourself nightmare, but Graylog has not found it.

Searching, reporting, and analysis

Once you’ve got your messages into Graylog, searching and reporting are the main ways to get everything out. At this time, Graylog does not have an internal reporting interface. You’re welcome to write your own reports — more of the do-it-yourself style of the product --- using the documented APIs.

We didn’t test performance of Graylog for searching, but in our test system with gigabytes of messages going back several months, most queries returned results immediately. Even when we used large time windows, answers were nearly instantaneous.

Searching using the GUI is quick and intuitive. You select a time window (such as “in the last hour”, or a more specific absolute time if you want), then just type what you’re looking for, including wildcards. To make use of the parsed fields in a message, you simply specify the field and then an operator. Graylog includes the normal ones, such as “is equal,” “contains,” or “is between,” as well as relational operators such as AND, OR, and NOT, but also some more esoteric ones, such as fuzzy proximity searches. For example, “network world”~3 matches messages which have “network” and “world” in any order, within 3 words of each other.

The Graylog GUI is advanced, but could use some work to match similar tools. Doing a drill-down into results by further refining a search query is easy, but requires you to go back to the keyboard and type in field names and value, rather than simply clicking on a value to add it to the query.

When a search returns results, Graylog immediately offers up a timeline histogram, which can be helpful in identifying patterns or finding when an event occurred, as well as the option to save a query or move results to a dashboard.

We used Graylog multiple times to debug problems in our network, track email messages, and investigate security events. Each time we were able to find the information we needed using the search language, and having all our logs together in a single place saved a lot of time in tracking problems that spanned multiple systems.

Dashboards are the main analysis tools for Graylog and are an amazing tool for getting visibility into what is happening across many messages. Dashboards are designed to aggregate data. For example, after sending our email security gateway logs to Graylog, we could build a dashboard that included a strip graph showing incoming levels of spam, viruses, and other threats.

1 2 Page 1
Page 1 of 2
It’s time to break the ChatGPT habit
Shop Tech Products at Amazon