CIA-backed Cleversafe announces 10-exabyte storage system

Massive repositories can be used for big data analytics

Object-based storage vendor Cleversafe today announced the availability of a storage system that can house up to 10 exabytes (that's 1 billion gigabytes) of data in a single pool of capacity.

To put a storage system of that size in perspective, 1,000 gigabytes is a terabyte, and a terabyte of storage can hold about 300 hours of video. Cleversafe's new storage system could hold 1 million times as much data as that.

Building a 10-exabyte storage system would require 4.5 million 3TB hard drives. Today's 3TB hard drives can cost as little as $150, but a storage system the size of Cleversafe's would still cost $705 million for the spinning disks alone.

Russ Kennedy, vice president of Cleversafe's product strategy, said the entire system -- with racks, networking equipment and Cleversafe software -- would run in the "single-digits" billions of dollars.

Cleversafe said it created the design for a 10-exabyte data storage system to address customers' need to capitalize on the intelligence gained through big data analytics, which require larger and larger data stores for unstructured data.

Although the company hasn't yet built out the full storage system, it has created a reference configuration that is tens of petabytes in size and dispersed in data centers in eight states, including New Jersey, California, Florida, Texas and Illinois.

"This configuration was built to prove it would work," Kennedy said. "We'll build it when [customers] want it. We have some very interested ones to date."

With worldwide Internet traffic volume increasing at a rate of 32% each year, companies looking to mine that data would "effectively analyze 80 exabytes of data per month by 2015," he said.

Cleversafe, a privately-held company founded in 2004, is well funded; it has received more than $31 million in venture money, including money from In-Q-Tel , a branch of the CIA that invests in startups.

"To any company, data is a priceless component. However, it's only valuable if a company can effectively look across that data over time for trends or to analyze behavior and to do it cost-effectively," said Kennedy. "In its true sense, Cleversafe's limitless data storage solution is a critical foundational enabler to Big Data analytics."

Big data tools are being used to analyze everything from IP traffic patterns for fraudulent activity to purchasing patterns for online retailers.

Cleversafe's new massive data storage buildout uses the same technology the company has been selling since its inception. Cleversafe's technology, which it calls Dispersed Storage, works by using a mathematical formula called the Cauchy Reed-Solomon Information Dispersal Algorithm to divide data before storing it.

The divided or "sliced" data, as Cleversafe calls it, is spread across multiple storage nodes (server appliances) using TCP/IP, typically across three or four data centers. Like RAID, the algorithm uses parity information to ensure that if any slices of data are lost or become corrupted, they can be rebuilt from the other slices.

"We're just using public Internet bandwidth. We use a number of network providers, both big and small telcos," Kennedy said.

Cleversafe uses three devices in its product offering: An Accesser node, which slices up and then retrieves data; a system called the Slicestor, which is the storage array that holds the data; and the Manager, a client that manages the storage network and offers various capacity reporting tools.

All data is stored under a single domain name space, so storage capacity appears as a single pool to a client server. Because each slice of data cannot be reassembled without the use of metadata held in a central database -- it's unrecognizable otherwise -- it is inherently secure, the company has said.

The 10-exabyte architecture has been expanded to allow for an independent scaling of storage capacity and performance through a system called Portable Datacenter (PD), a collection of storage and network racks that can be easily deployed or moved.

Each PD contains 21 racks with 189 storage nodes; each node has 45 3TB drives. The geographically distributed PD model allows for rapid scaling and mobility and is further optimized for site failure tolerance and high availability, Cleversafe said. The company's current configuration includes 16 sites across the U.S., with 35 PDs per site and hundreds of simultaneous readers/writers to deliver instantaneous access to billions of objects.

"In order for companies to continue to protect their data assets and to glean insight from the vast amounts of new data being collected, they must consider technology alternatives beyond RAID in order to scale without limits," David Reinsel, an analyst at research firm IDC, said in a statement.

While Cleversafe has yet to receive any customer orders for a 10-exabyte system, Kennedy did say there's a lot of interest from "Fortune 50" type corporations.'

"The concept of dispersal and the ability to store large unstructured objects without having to copy or replicate is really the impetus behind this kind of system," he said. "Most state-of-the-art object-based storage systems rely on a second and third copy in order to preserve the data. We're obviously able to do that with one copy."

Read more about bi and analytics in Computerworld's BI and Analytics Topic Center.

Copyright © 2012 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon