Skip the navigation

Supercomputer race: It's a tricky task to boost (and measure) system speed

The Top500 list is always climbing to new heights. Can we believe the hype?

By Gary Anthes
September 22, 2008 12:00 PM ET

Computerworld - Every June and November, with fanfare lacking only in actual drum rolls and trumpet blasts, a new list of the world's fastest supercomputers is revealed. Vendors brag, and the media reach for analogies such as "It would take a patient person with a handheld calculator x number of years (think millennia) to do what this hunk of hardware can spit out in one second."

The latest Top500 list, released in June, was seen as especially noteworthy because it marked the scaling of computing's then-current Mount Everest -- the petaflops barrier. Dubbed "Roadrunner" by its users, a computer built by IBM for Los Alamos National Laboratory in New Mexico topped the list of the 500 fastest computers, burning up the bytes at 1.026 petaflops, or more than 1,000 trillion arithmetic operations per second.

A computer to die for if you are a supercomputer user for whom no machine ever seems fast enough? Maybe not.

Richard Loft, NCAR
Richard Loft, NCAR

Richard Loft, director of supercomputing research at the National Center for Atmospheric Research in Boulder, Colo., says he doubts Roadrunner would operate at more than 2% of its peak rated power on NCAR's ocean and climate models. That would bring it in at 20 to 30 teraflops -- no slouch, to be sure, but so far short of that petaflops goal as to seem more worthy of the nickname "Roadwalker."

"The Top500 list is only useful in telling you the absolute upper bound of the capabilities of the computers," Loft says. "It's not useful in terms of telling you their utility in real scientific calculations."

The problem, he says, is that placement on the Top500 list is determined by performance on a decades-old benchmark called Linpack, which is Fortran code that measures the speed of processors on floating-point math operations -- for example, multiplying two long decimal numbers. It's not meant to rate the overall performance of an application, especially one that does a lot of interprocessor communication or memory access.

Test bench

If the Top500 list of supercomputers is based on such a narrow criterion -- floating-point performance -- why isn't a better benchmark used?

"I believe that you could come out with a measure that's more useful for what we do," says Richard Loft, director of research and development for supercomputing at NCAR, which models the Earth's oceans and atmosphere.

Such a measure, he says, might already exist in something called the HPC Challenge Benchmark, a suite of tests sponsored by the Defense Advanced Research Projects Agency and developed at the University of Tennessee. The tests consist of the Linpack floating-point benchmark plus six others that measure things such as integer math, memory updates, sustainable memory bandwidth and interprocessor communications.

"The good news -- or the bad news -- about the Linpack number is it's a single number," says University of Tennessee professor Jack Dongarra, who chose the benchmark years ago to rank computers for his list of "fastest" computers.

"If I knew the user's application, I might be able to say that you need to weight various metrics in a certain way to compare systems," he says. "But that reduction is hard to do, and I couldn't do it in the abstract for the Top500 list."

Moreover, users and vendors seeking fame high on the list go to elaborate pains to tweak their systems to run Linpack as fast as possible -- a tactic permitted by the list's compilers.

The computer models at NCAR simulate the flow of fluids over time by dividing a big space -- the Pacific Ocean, say -- into huge grids and assigning each cell or group of cells in the grid to a specific processor in a supercomputer.

It's nice to have that processor run very fast, of course, but getting to the end of a 100-year climate simulation requires an enormous number of memory accesses by a processor, something that typically happens much more slowly. In addition, some applications require passing many messages from one processor to another, which can also be relatively slow.

So, for many applications, the bandwidth of the communications network inside the box is far more important than the floating-point performance of its processors. That's even more true for business applications, such as online search or transaction processing.

An even greater bottleneck can crop up in programs that can't easily be broken into uniform, parallel streams of instructions. If a processor gets more than its fair share of work, all the others may wait for it, reducing the overall performance of the machine as seen by the user. Linpack operates on the cells of matrices, and by making the matrices just the right size, users can keep every processor uniformly busy and thereby chalk up impressive performance ratings for the system overall.

IBM's Roadrunner supercomputer
IBM's Roadrunner supercomputer broke the petaflops barrier in June.


Our Commenting Policies