Following their march from standard processors to dual-core and quad-core designs in 2006, Intel Corp. researchers have built an 80-core chip that performs more than a trillion floating-point operations per second (TFLOPS) while using less electricity than a modern desktop PC chip.
First described by Intel executives at a September trade show, the chip fits 80 cores onto a 275-square-millimeter, fingernail-size chip and draws only 62 watts of power -- less than many modern desktop chips.
The company has no plans to bring this "teraFLOPS research chip" to market, but is using it to test new technologies such as high-bandwidth interconnects, energy management techniques and a tile design method to build multicore chips, said Jerry Bautista, director of Intel's terascale research program. He spoke in a conference call with reporters on Friday before presenting technical details of the research at the Integrated Solid-State Circuits Conference in San Francisco.
Intel engineers are also using the chip to explore new forms of tera-scale computing, in which future users could process terabytes of data on their desktops to perform real-time speech recognition, conduct multimedia data mining, play photorealistic games and interact with artificial intelligence.
Until now, that degree of computing performance has been available only to scientists and academics using machines such as ASCI Red, the TFLOPS supercomputer built by Intel Corp. and its partners in 1996 for U.S. government researchers at Sandia National Laboratories, near Albuquerque, N.M. That system handled a comparable amount of computing as the new chip, but demanded an enormous 500 kilowatts of power and 500 kilowatts of cooling to run its nearly 10,000 Pentium Pro chips.
Shrunk onto a single chip, that power would allow average consumers to use their PCs in new ways. They could use improved search functions on the vast amounts of digital media stored on home desktops, searching large photo archives for specific attributes such as all the shots where a certain person is smiling, or where that person is posing with a friend, Bautista said.
Running at 3.16 GHz, the new chip achieves 1.01TFLOPS of computation -- an efficiency of 16GFLOPS per watt. It can run even faster, but loses efficiency at higher speeds, performing at 1.63TFLOPS at 5.1 GHz and 1.81TFLOPS at 5.7 GHz.
The processor saves power by shunting idle cores into sleep mode, then instantly turning them on as they're needed, said researchers. Each modular tile has its own router built alongside the core, creating a "network on a chip."
Despite using such an efficient grid, the researchers found they could actually hurt performance by adding too many cores. Performance scaled up directly from two cores to four, eight and 16. But they found that computing performance began to drop with 32 and 64 cores.
"If we simply added more than 16 cores, we would get diminishing returns, because the threads and data traffic would not be used properly, so the cores get in the way of each other. It's like having too many cooks in the kitchen," said Bautista.
To solve the problem on the new chip, they used a hardware-based thread scheduler and faster on-chip memory caches, optimizing the way data flows from memory into each core. To improve the design, Intel researchers plan to add a layer of "3-D stacked memory" under the chip to minimize the time and power required to feed the cores with data. Next, they will create a mega-chip that uses general purpose cores instead of the floating-point units used in the current design.
Click the play button to see the prototype.