There is a race to make supercomputers as powerful as possible to solve some of the world's most important problems, including climate change, the need for ultra-long-life batteries for cars, operating fusion reactors with plasma that reaches 150 million degrees Celsius and creating bio-fuels from weeds and not corn.
Supercomputers allow researchers to create three-dimensional visualizations, not unlike a video game, to run endless "what-if" scenarios with increasingly finer detail. But as big as they are today, supercomputers aren't big enough -- and a key topic for some of the estimated 11,000 people now gathering in Portland, Ore. for the 22nd annual supercomputing conference, SC09, will be the next performance goal: an exascale system.
Today, supercomputers are well short of an exascale. The world's fastest system at Oak Ridge National Laboratory, according to the just released Top500 list, is a Cray XT5 system, which has 224,256 processing cores from six-core Opteron chips made by Advanced Micro Devices Inc. (AMD). The Jaguar is capable of a peak performance of 2.3 petaflops.
But Jaguar's record is just a blip, a fleeting benchmark. The U.S. Department of Energy has already begun holding workshops on building a system that's 1,000 times more powerful -- an exascale system, said Buddy Bland, project director at the Oak Ridge Leadership Computing Facility that includes Jaguar. The exascale systems will be needed for high-resolution climate models, bio energy products and smart grid development as well as fusion energy design. The later project is now under way in France: the International Thermonuclear Experimental Reactor, which the U.S. is co-developing.
"There are serious exascale-class problems that just cannot be solved in any reasonable amount of time with the computers that we have today," said Bland.
As amazing as supercomputing systems are, they remain primitive and current designs soak up too much power, space and money. It wasn't until 1997 that the first teraflop system, ASCI Red at Sandia National Lab, broke the teraflop barrier, reaching one trillion calculations per second. In 2008 IBM's Roadrunner at the Los Alamos National Laboratory achieved petaflop speed, or one thousand trillion (one quadrillion) sustained floating-point operations per second.
The Energy Department, which is responsible for funding many of the world's largest systems, wants two machines somewhere in the 2011-13 timeframe that will reach approximately 10 petaflops, said Bland.
But the next milestone now getting attention from planners is something that can reach an exaflop, or a million trillion calculations per second, (one quintillion). That's 1,000 times faster than a petaflop.
The exaflop will likely arrive around 2018. The big performance leaps are expected to happen every decade or so. Moore's Law, which says the number of transistors on a chip will double every 18 months or so, helps to explain the roughly 10-year development period. But the problems involved in reaching exaflop scale go well beyond Moore's Law.
The Jaguar uses 7 megawatts of power or 7 million watts. An exascale system that used CPU processing cores alone might take 2 gigawatts or two billion watts, says Dave Turek, IBM vice president of deep computing. "That's roughly the size of medium-sized nuclear power plant. That's an untenable proposition for the future," he said.
IBM is in competition with Cray and other supercomputing makers, and finding a way to cut power demands for users is among the top problems. But the vendors still have to decide how to build these systems. Increasingly, they're likely to use hybrid approaches that combine co-processors or accelerators with CPUs in an effort to cut power.
The Roadrunner, which uses 3.9 megawatts, achieved just over one petaflop when it was announced. It uses a hybrid architecture that mixes AMD processors with Cell processors that include nine separate processor cores, including one PowerPC core and eight smaller co-processing units called synergistic processing elements. The use of co-processors, which includes graphics processing units, and field-programmable gate arrays, are intended to help cut power demand by moving some of the work off CPUs to processors that handle more specialized work.
Estimates on the size of exascale systems range from 10 million to 100 million cores. Turek believes the latter number is more likely.
"We think exascale is a 100 million-core kind of enterprise, and there doesn't seem any real pathway around it, said Turek. "Where the players in pursuit of exascale are today is [at] a state of investigation to see what the right model is. So if hybridization is the key, then what is the ratio of special-purpose cores to conventional cores?" he said.
These future systems will have to use less memory per core and will need more memory bandwidth. Systems running 100 million cores will continually see core failures and the tools for dealing with them will have to be rethought "in a dramatic kind of way," said Turek.
IBM's design goal for an exascale system is to limit it to 20 megawatts of power and keep it at a size of between 70 and 80 racks. Jaguar is entirely built of CPUs, but Bland also sees future systems as hybrids, and points to chip development by both Intel and AMD that combine CPUs and co-processors.
"We believe that using accelerators is going to be absolutely critical to any strategy to getting to exaflop computers," he said.
Addison Snell, CEO of InterSect360 Research, an HPC research firm in Sunnyvale, Calif., said accelerators are capable of providing vast computational capability for specific applications, and the applications that can take advantage of them can move toward exascale first." Eventually, a general-purpose exascale system will arrive, "but special-purpose will probably come first."
Before exascale arrives, petaflop systems will continue to grow in size, and government-funded efforts to build massive systems seems to be on the rise. Fujitsu is planning a 10-petaflop computer in 2011 for Japan's Institute of Physical and Chemical Research, and China has now reached petaflop scale. Governments appear to be more willing to fund large systems, and an international race may be starting to build systems capable of solving some of the world's most pressing problems.