Supercomputer race: It's a tricky task to boost (and measure) system speed

The Top500 list is always climbing to new heights. Can we believe the hype?

1 2 3 Page 2
Page 2 of 3

"As long as we continue to focus on peak floating-point performance, we are missing the actual hard problem that is holding up a lot of science," Loft says.

But the "hard problem" is getting the attention of computer and chip makers. IBM, which makes the Blue Gene family of supercomputers, has taken a systems approach.

Rather than cobbling together commodity processors with commodity interconnects like Ethernet or InfiniBand -- an approach that others have used -- IBM built five proprietary networks inside Blue Gene, each optimized for a specific kind of work and selectable by the programmer. Members of the Blue Gene family held the No. 1 and No. 2 positions on the Top500 list until June of this year.

By making memory access faster, and by doing it more cleverly, the absolute amount of memory in a system can be reduced, says Dave Turek, vice president of Deep Computing at IBM. As engineers work to build "exascale" computers (a thousand times faster than Roadrunner), that will be essential, he says.

"Going back a few years, you'd build a computer with the fastest processors possible and the most memory possible, and life was good," Turek says. "The question is, how much memory do you need to put on an exascale system? If you want to preserve the kinds of programming models you've had to this point, you'd better have a few hundred million dollars in your pocket to pay for that memory."

And it isn't just the purchase cost of memory that's a problem, Turek notes. Memory draws a lot of expensive power and generates a lot of heat that must be removed by expensive cooling systems.

Faster memory subsystems and faster interconnects will help, Turek says, but supercomputer users will also have to overhaul the programming methods that have evolved over the past 20 years if they hope to utilize the power of exascale computers.

He says users initially criticized Blue Gene for having too little memory, but eventually they were able to scale their applications to run well on 60,000 processors by changing the algorithms in their application code so they were more sparing in their memory use.

Beep! Beep!

IBM calls Roadrunner, which cost Los Alamos $120 million, a "hybrid" architecture because it uses three kinds of processors. Basic computing is done on an off-the-shelf, 3,250-node network, with each node consisting of two dual-core Opteron microprocessors from Advanced Micro Devices Inc.

But Roadrunner's magic comes from a network of 13,000 "accelerators" in the form of Cell Broadband Engines originally developed for the Sony PlayStation 3 video game console and later enhanced by IBM. Each Cell chip contains an IBM Power processor core surrounded by eight simple processing elements.

The Cells are optimized for image processing and mathematical operations, which are central to many scientific applications. A Cell can work on all the elements in a well-defined string or vector, ideal for the matrix math in the Linpack benchmark. Los Alamos says the Cells speed up computation by a factor of four to nine over what the Opterons alone could do. Nevertheless, the lab says it expects its production programs to run at sustained speeds of 20% to 50% of the celebrated 1 petaflops benchmark results.

The advantages of using three kinds of processors come at a cost. Just as the Linpack code had to be optimized for the machine, so do most other programs. A recent report from Los Alamos said this of the effort required to get an important simulation tool to run on Roadrunner: "Accelerating the Monte Carlo code called Milagro took many months, several false starts and modifications of 10% to 30% of the code." But in the end, the lab said, Milagro ran six times faster with the Cell chips than without them, and that was "a crucial achievement for the acceptance of Roadrunner."

Andrew White, Roadrunner project director at Los Alamos, told Computerworld that the effort to port and optimize code for Roadrunner was "less than we thought it would be" after programmers got some experience with it. A program with "tens of thousands of lines of code" is taking about one man-year to get going on the supercomputer, he said.

1 2 3 Page 2
Page 2 of 3
Shop Tech Products at Amazon