One topic most information security professionals hate is metrics. When I was working as a security analyst and later as a security engineer, I always hated when my boss asked me to pull logs or query the remedy ticketing system and then use the metrics to report on various aspects of what I did or how well our security infrastructure was protecting the company.
Now that I’m the manager, I’m the one who asks for such measurements every quarter, and my guys snicker at me. But as a manager, I’ve learned to appreciate what metrics can tell me — and what they can help me get.
Every major department within the IT organization of my company is responsible for gathering quarterly data into key performance indicators (KPI). An Internet search will turn up several definitions of KPIs, some saying they are interchangeable with balanced scorecard metrics. In my mind, they are the same thing: metrics.
At my company, we use KPIs to measure the effectiveness and efficiency of key areas of IT. For example, to measure the effectiveness of our help desk, we report on the number of trouble tickets closed within a certain amount of time. To measure the efficiency of our e-mail infrastructure, we report on the average time it takes for an e-mail message to be delivered. (That measurement also serves as a capacity planning tool. When e-mail delivery time increases, we know we need to increase our e-mail delivery capacity, whether in the form of network bandwidth, hardware or memory.) We report on the percentage of backups that fail, the percentage of IT projects delivered on time and so on.
In the security realm, calculating a balanced scorecard can be somewhat difficult, and alternative metrics are needed. You might think that if the company didn’t get hacked, wasn’t robbed of intellectual property, didn’t suffer a denial-of-service attack or have malicious code such as viruses, worms and Trojan horse programs propagating through its network, then the metrics would be easy — clearly, we’re doing our job. Well, it doesn’t work like that, and I have to be creative in my quarterly metrics.
My predecessor tackled this problem by using metrics that, for all intents and purposes, seemed to have been pulled out of some CISSP book. They provided nothing meaningful. For example, one metric measured the percentage of the infrastructure meeting ISO 17799 compliance. First of all, the company has never been through an ISO 17799 certification. Second, metrics on the elements of the ISO 17799 framework don’t really tell you anything.
Early on, I sat down with my team to brainstorm metrics, focusing on the tools we already have in place that could be leveraged for that purpose.
For example, we use Trend Micro tools to combat viruses, and our deployment includes Trend Micro Control Manager (TMCM), which can report on various aspects of the Trend Micro antivirus agents. So we use TMCM to report the percentage of managed desktops that meet the recommended antivirus version and pattern file. I have combined that with reporting from our Microsoft Systems Management Server to show how many managed systems meet the recommended security patch level, and so we have virus and security patches in one measurement. (I use the word managed, since some workstations and servers by design don’t have antivirus tools installed, nor are they regularly patched. These include systems in a lab environment and certain engineering systems.)
And since we use Remedy for our help desk and general trouble-ticketing system, we report on average time to resolve a security incident.
Another tool came to hand recently when we implemented a Juniper intrusion-detection sensor. Now that it’s tuned, we use it not just to detect hacking activity, but also to report on violations of our unacceptable-use policy — we can measure the percentage of network traffic that represents unauthorized use. This metric is complicated by the fact that the definition of unacceptable use changes frequently as new or recently discovered applications or technologies are added. When I report this metric, I have to add the caveat that what we measure is always shifting. Still, it’s meaningful, and our CIO loves to see how many users are spending their day downloading MP3s or using Skype to talk to their college buddies.
I also report the percentage of our network covered by the intrusion-detection system (IDS). I like to use this figure to help justify additional IDS sensors. Currently, we’re able to monitor only 40% or so of our overall network traffic. My goal, of course, is 100%. My CIO raised his eyebrows when he saw evidence that a lot of employees are using Skype, even though we’re monitoring only 40% of our overall bandwidth. For a second, I thought he was going to break out his checkbook and cut me a check to purchase additional sensors. It didn’t happen this time, but I’ll have to work on that for next quarter.
I could expand my reporting to include information from Tripwire, Smart Filter, firewall logs and other sources. But the other departments report on only four or five items, and I feel that is an appropriate number for my group as well — to the relief of my metric-hating employees. So far, I’ve had the opportunity to report these metrics only once, but upper management was very interested in observing the changes over time. Therefore, I expect that as time progresses, I’ll be able to use these metrics as a tool to both measure security effectiveness and make a case for an increase in resources to allow me to hire more engineers and purchase more infrastructure.
What Do You Think? This week’s journal is written by a real security manager, “Mathias Thurman,” whose name and employer have been disguised for obvious reasons. Contact him at email@example.com, or join the discussions in our security blogs: computerworld.com/blogs/security To find a complete archive of our Security Manager’s Journals, go online to computerworld.com/secjournal