When it comes to high-speed data processing, RAM has always been the go-to memory for computers because it's tens of thousands of times faster than disk drives and many times faster than NAND flash.
Researchers at MIT, however, have built a server network that proves for big data applications, flash is just as fast as RAM and vastly cheaper.
In the age of big data, where massive data sets are used to uncover the purchasing trends of millions of people or predict financial market trends based on millions of data points, a single computer's RAM won't do.
For example, the data needed to process a single human genome requires between 40 and 100 typical computers.
NAND flash is about a tenth as expensive as RAM and it also consumes about a tenth as much power. So at the International Symposium on Computer Architecture last month, MIT researchers revealed a new system that proved flash memory is just as efficient as conventional RAM, but it also cuts power and hardware costs.
"Say, we need to purchase a system [to] process a dataset that is 10TBs large. To process it in DRAM, we would need a cluster of about 100 computers, assuming servers with 100GB of DRAM," Arvind Mithal, the Johnson Professor of Computer Science and Engineering at MIT, said in an email reply to Computerworld. "Such a cluster will cost around $400K to build."
Each server was connected to a field-programmable gate array, or FPGA, a kind of chip that can be reprogrammed to mimic different types of electrical circuits. Each FPGA, in turn, was connected to two 500GB flash chips and to the two FPGAs nearest it in the server rack.
Networked together, the FPGAs became a fast network that allowed any server to retrieve data from any flash drive. The FPGAs also controlled the flash drives.
Arvind, as Mithal typically goes by, said to process the same 10TB dataset in flash, only 10 computers -- each with 1TB of flash storage -- would be needed. Even including the cost of FPGA-based accelerator hardware, the total price of the system would be less than $70,000 or so, he said.
"This price may go down even further if we consider the fact we don't need as much DRAM on each server on a flash based system," Arvind said. "If we use a lower-end server with less DRAM, the system will cost around $40K."
Maintaining a flash-based system is also much cheaper, he continued, because flash consumes much less power than DRAM, and also because it would require fewer servers. Even when the additional power consumption of flash and FPGA accelerators were factored in, MIT's server network prototype showed that the flash storage device added only about 10% power consumption to the whole system.
In fact, even without their new network configuration, the researchers showed that if servers working on a distributed computation use disk drives to retrieve data just 5% of the time, performance is the same as if it were using flash.
For example, 40 servers with 10TB of RAM could not handle a 10.5TB computation any faster than 20 servers with 20TB worth of flash memory. And, the flash would cost less and consume a fraction of the power.
The researchers were able to make a network of 20 flash-based servers competitive with a network of RAM-based servers by moving some of the computational power off the servers and onto the flash drives' controller chips.
The researchers used flash drives to preprocess some of the data before passing it back to the servers, increasing the efficiency of the distributed computation.
"This is not a replacement for DRAM [dynamic RAM] or anything like that," Arvind said.
Arvind performed the work with a group of graduate students and researchers at Quanta Computer. The research showed there are likely many applications that can replace RAM and take advantage of a flash-based computer architecture's lower cost.
"Everybody's experimenting with different aspects of flash. We're just trying to establish another point in the design space," Arvind said.