Q&A: What Roadrunner's petaflop Top500 milestone is all about

Supercomputer clocks a peak performance of one quadrillion floating-point operations per second.

The Top500 list of the world's most powerful supercomputers passed a milestone today with the first system to achieve peak performance of 1 petaflop, or one quadrillion floating-point operations per second.

The system, called Roadrunner, was built by IBM for the U.S. Department of Energy's Los Alamos National Laboratory. It's based on an advanced version of the Cell processor used in Sony Corp.'s PlayStation 3, and it's performance outstrips by far the previous fastest system, another IBM computer that topped out at 478.2 teraflops.

Erich Strohmaier, a computer scientist at Lawrence Berkeley National Laboratory, was one of the founding editors of the Top500 list back in 1993. He talked with the IDG News Service ahead of the announcement about the performance gains the list has seen, the quad-core processors that are coming to dominate it and mistakes that can creep in when the list is put together. Following is an edited transcript:

Did you expect to see performance of a petaflop when you started this list?

No. Fifteen years ago, the big question was whether all 500 systems together would amount to 1 teraflop — and it was just above 1 teraflop, all 500 of them together.

Where does the performance of the IBM system come from? Is it mainly from the Cell processor, or advances somewhere else?

For the Roadrunner, it's a very dense package in terms of the computing power. The advanced Cell is important, with eight of those [cores] on a single processor, but it's also because it's tightly integrated. It's a blade system, so you get a lot of these in a rack.

Does that cut down on latency between the blades?

Yes, you lose that latency, and you also need that kind of packaging to cut down on the power. Using the Cell is one way, but using these tightly integrated blade systems is another way to control power.

Does someone go around and audit these systems? How do you know the results are genuine?

In the first place, it's an honor system. But, of course, for the big systems, we ask them to run the benchmark and we want to see the output files.

Have you ever caught anyone cheating?

Not on the larger-scale systems, but there are always mistakes on the list. Big companies don't really know precisely how much [equipment] they've sold where, because they don't track sales by system; they track them by components. So they know they've shipped so many blades of a certain type to the U.K., but they don't know how they are configured at customer sites. So yes, there have been mistakes made.

The more common mistake is that there are still systems on the list even though they have been decommissioned, because companies don't usually tell us when they shut their systems down. The thing that keeps the list healthy is that we lose, over a typical six month interval, about 200 to 220 systems. So if we made some mistakes, they'll be out of the list very quickly. This time, we had record turnover: We lost 300 systems.

To what do you attribute that?

We've seen record turnovers a few times in the last three or four lists; that's a reflection of the market adopting the new quad-core processors. It's the dominant architecture in terms of how many cores are used, and it became that very quickly. Lots of these quad-cores are Intel Harpertown [the Xeon 5400 series]. There are already more Harpertown systems on the list than Clovertown [the earlier Xeon 5300 series]. It shows that our supercomputing community is ready to use those processors, and Linpack [the benchmark used to rank the supercomputers] can use a lot of features of the Harpertown and Clovertown quad-cores.

Intel seems to be increasingly dominant on the list. Is that because Advanced Micro Device's quad-core chips were delayed coming to market?

Yes, I certainly agree with that. When AMD came out with their dual-core processors, they had a head start compared to Intel and gained a larger share of the list. In the last year to a year and a half, that has reversed and Intel's share has increased more. One reason has been the delays in AMD's quad-cores; the other is that for Clovertown, Intel introduced four floating-point operations per cycle per core. AMD was late doing that; they do it now with the new quad-core, but they didn't do it with the dual-core. And the Linpack benchmark and applications similar to it can use this four-floating-point feature, so they show up better on the list.

Was it a scramble to get the results in on time? Some people wondered if Roadrunner would be ready.

Yes. For Roadrunner, it wasn't too much of a scramble, but they submitted it in time. But they still haven't used the full machine. The machine is in 18 segments, and they used only 17 of those, so they still have room to grow in terms of doing a new measurement and squeezing out a little more. It was amazing they managed to do the petaflop.

Why did you start the list? Is it just for fun, or does it serve another purpose?

It was fun, and also to get a handle on the market shares for supercomputers. My colleague professor Hans Meuer started doing statistics in the late 1980s. That was the golden age of vector systems, so it was easy to count supercomputers: You just counted the vector systems. Then, in the early '90s, when the first parallel systems were becoming important, that method didn't work, so we scratched our heads and said, "What is the definition of a supercomputer?" We wanted a system that would scale over time because performance scales so quickly — it's scaled 10,000-fold since we started the list. So we said, "OK, let's pick a fixed number of computers that we know are supercomputers." There were 500 vector systems at the time, so that's why we picked the number 500.

1 2 Page 1
Page 1 of 2
  
Shop Tech Products at Amazon