Supercomputers Make a Comeback

Proving that rumors of their demise were greatly exaggerated, very-high-performance computers show promise as tools to do the heavy lifting for e-commerce applications in mainstream IT shops.

It was the early 1990s, and supercomputing was an industry in decline. Its biggest customers - U.S. defense and intelligence agencies - cut back sharply on purchases of the costly behemoths following the end of the Cold War.

Meanwhile, with the speed of commodity microprocessors doubling every 18 months, the performance advantage of the multimillion-dollar, custom-built machines became less and less compelling.

The end seemed finally to come, symbolically at least, in 1996, when supercomputer grand master Seymour Cray died following a car accident. Cray was foremost among a handful of computer geniuses who for decades had designed the world's fastest computers.

But while these events played out on center stage, behind the scenes, nondefense scientific and engineering applications of supercomputing grew as companies learned how to mimic the physical world in digits. The Boeing Co. in Seattle used supercomputers to design its 777 airplane - which has 3 million parts - without relying on physical mock-ups. It was the first plane ever developed that way.

Now, supercomputers are going into mainstream corporate information technology shops, where they are doing the heavy lifting required for such tasks as processing immense and unpredictable Web transaction volumes. In addition, users have discovered that esoteric scientific algorithms can be used to mine huge databases for sales patterns, detect credit-card fraud and measure the risk of complex investment portfolios.

At Charles Schwab & Co., an IBM RS/6000 SP supercomputer with 2,000 processors does the Web serving and some of the back-end processing for all of the brokerage's e-commerce services. Connected by a high-speed switch, the processors can work together at more than a half-trillion operations per second. It's the 19th most powerful computer on the planet, according to a just-published list ofthe top 500 supercomputers (

The Schwab operating environment is one marked by high transaction volumes, unpredictable demand and the need to execute customer trades and update accounts almost instantly, says Adam Richards, a vice president at the San Francisco-based firm. As many as 95,000 users have been logged onto the Schwab site simultaneously, he says. "These computers were originally designed for large-scale, numerical calculations," Richards says, "but certain things they had to do - in order to make the calculations efficiently and ship results around - became very useful to us."

And Schwab will have to scale up the system even faster as customers move from simple online account inquiry to complex financial planning on Schwab systems. Things such as portfolio-risk assessments, which use simulations and complex mathematical calculations, will move from the hands of professional specialists into the hands of everyday Schwab customers, he says.

Infiltrating Corporate IT

Smaby Group Inc., an IT consultancy in Minneapolis, says "complex scalable computing" in commercial settings is growing at 21% per year. The market for these high-performance systems, costing from $100,000 into the millions, was $8.6 billion last year and will be $14.8 billion in 2002.

A decade ago, most supercomputers were at universities and government agencies. Now, more than half of the 500 fastest computers in the world are in corporations, says Jack Dongarra, a computer science professor and supercomputer expert at the University of Tennessee in Knoxville. Dongarra is one of the authors of the biannual top 500 supercomputers list (see chart).

According to Dongarra, supercomputers are growing in power faster than predicted by Moore's Law, which says that the speed of microprocessors will double every 18 months. That's partly because supercomputers are being built with more and more processors.Indeed, he says there are no longer any single-processor systems on his top 500 list.

Additionally, Dongarra says, supercomputers are using better software, including smarter algorithms and better optimizing compilers.

The combination of faster processors, more processors and better software has been boosting supercomputer performance three orders of magnitude every decade. Dongarra points out that in 1980, the fastest computers in the world worked at about 1 million floating-point operations per second (MFLOPS). Ten years later, top speeds were 1,000 times faster - 1 GFLOPS - and today they are 1,000 times faster still - at 1 TFLOPS.

Those spectacular improvements will continue, Dongarra says, so that in 2010, therewill be machines running at 1,000 TFLOPS, or 1 petaFLOPS (PFLOPS). Operating at 1 quadrillion computations per second, such a computer could do in one second what it would take the entire population of the U.S. 50 days to do working nonstop with hand calculators.

At least one computer may jump the PFLOPS hurdle five years earlier - for a very specialized application. IBM recently announced it would build Blue Gene, a computing colossus for analyzing the behavior of human proteins. Blue Gene will have 1 million processors - 32 to a chip - able to compute at 1 GFLOPS.

Today, IBM comes in second on the top 500 list with a5,808-processor behemoth at Lawrence Livermore National Laboratory. With 2.5 terabytes (TB) of memory and 75TB of disk storage, the system simulates the behavior of nuclear weapons at more than 2 TFLOPS. Called ASCI BluePacific, it's actually a "constellation" of three RS/6000 SP systems lashed together by a very-high-speed switch. A new supercomputer, ASCI White, will be shipped to the lab later this year and then will be the fastest in the world, IBM says.

ASCI White will use processor chips with copper interconnects and silicon-on-insulator technology, both of which boost performance, says Pete Ungaro, IBM's vice president for scientific and technical computing. Later, IBM will roll out the Power4 chip, a processor with two 64-bit, 1-GHz cores with 100GB/sec. of internal bandwidth.

But advancements in CPU speed aren't enough, Ungaro says. To improve overall system performance, IBM is developing faster ways to communicate among processors, memory and peripheral devices. On the software front, IBM Research is developing more efficient algorithms and faster libraries, he says.

Unfortunately, having more, faster processors doesn't ensure that users get a corresponding boost out of their machines. The biggest supercomputers today often operate at less than 10% of their theoretical maximums because their processors can't be kept busy all the time. That may be because the application software couldn't be or wasn't "parallelized" - structured so that every processor has its own code to run most of the time. Or it may be because of memory latency, which rears its ugly head when processors wait idly for data from memory or, worse, disk.

A solution to the latency problem is to add multiple levels of cache storage on or near the processor chip where commonly used data or instructions can be retrieved very rapidly. Systems today have three levels of cache, but more will be added, Dongarra says.

"We have failed to capitalize on the performance potential of scalable, parallel machines," says Ken Kennedy, director of the Center for High Performance Software at Rice University in Houston. Programmers haven't been good enough at structuring their code for parallel processing and have had difficulty optimizing their code for the complex memory hierarchies in many parallel systems, he says.

Advances to Come

But Kennedy says research shows promise for shifting those burdens from programmers to compilers and other tools. Compilers will produce code that more efficiently uses a processor's cache and local memories and do more global optimization by considering entire programs rather thanindividual routines, he says. And higher bandwidth inside machines will reduce memory latency, he predicts.

Tera Computer Co. (now Cray Inc., having bought the Cray supercomputer business from Silicon Graphics Inc. in April) devised another solution to the latency problem a decade ago. Called "multithreading," Tera's complex, custom-built processors each contain up to 128 "virtual" processors working in parallel. A 16-processor machine working on 50 instruction threads could execute 800 instructions at once, says Burton Smith, the company's chief scientist.

Processors that share a single, central memory and don't use caches make programming easier because the programmer doesn't have to worry about where data is. And memory latency is almost eliminated because all processors can access any part of memory at full processor speed.

"There have been many bugs in the software," says Wayne Pfeiffer, deputy director of the San Diego Supercomputer Center, and there have been only a few applications in which the Tera/Cray machine outperformed other supercomputers. Still, Pfeiffer says the multithreading concept holds much promise for simplifying programming and boosting performance.

Copyright © 2000 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon