Measuring system performance may sound simple enough, but IT professionals know that there's a lot more to it than might appear. In this excerpt from Systems Performance: Enterprise and the Cloud, performance engineer Brendan Gregg offers advice on what not to do when benchmarking.
To do benchmarking well is not a fire-and-forget activity. Benchmark tools provide numbers, but those numbers may not reflect what you think, and your conclusions about them may therefore be bogus.
With casual benchmarking, you may benchmark A, but actually measure B and conclude you've measured C.
Benchmarking well requires rigor to check what is actually measured and an understanding of what was tested to form valid conclusions.
For example, many tools claim or imply that they measure disk performance but actually test file system performance. The difference between these two can be orders of magnitude, as file systems employ caching and buffering to substitute disk I/O with memory I/O. Even though the benchmark tool may be functioning correctly and testing the file system, your conclusions about the disks will be wildly incorrect.
Understanding benchmarks is particularly difficult for the beginner, who has no instinct for whether numbers are suspicious or not. If you bought a thermometer that showed the temperature of the room you're in as 1,000 degrees Fahrenheit, you'd immediately know that something was amiss. The same isn't true of benchmarks, which produce numbers that are probably unfamiliar to you.
It may be tempting to believe that a popular benchmarking tool is trustworthy, especially if it is open source and has been around for a long time. The misconception that popularity equals validity is known as argumentum ad populum logic (Latin for appeal to the people).
Analyzing the benchmarks you're using is time-consuming and requires expertise to perform properly. And, for a popular benchmark, it may seem wasteful to analyze what surely must be valid.
The problem isn't even necessarily with the benchmark software -- although bugs do happen -- but with the interpretation of the benchmark's results.
Numbers without analysis
Bare benchmark results, provided with no analytical details, can be a sign that the author is inexperienced and has assumed that the benchmark results are trustworthy and final. Often, this is just the beginning of an investigation, and one that finds the results were wrong or confusing.
Every benchmark number should be accompanied by a description of the limit encountered and the analysis performed. I've summarized the risk this way: If you've spent less than a week studying a benchmark result, it's probably wrong.
Much of my book focuses on analyzing performance, which should be carried out during benchmarking. In cases where you don't have time for careful analysis, it is a good idea to list the assumptions that you haven't had time to check and include them with the results, for example:
- Assuming the benchmark tool isn't buggy
- Assuming the disk I/O test actually measures disk I/O
- Assuming the benchmark tool drove disk I/O to its limit, as intended
- Assuming this type of disk I/O is relevant for this application
This can become a to-do list, if the benchmark result is later deemed important enough to spend more effort on.