Open-source project to do cheap data storage across the Web

Cleversafe project looks to slice up the Web

Cleversafe Inc.'s CEO, Christopher Gladwin, can't be accused of thinking small. "Our plan is to create a method for the world to store its data in the same way that the Internet is a method for the world to inter-network," is the way he puts it.

That's tall talk, but then again, serial entrepreneur Gladwin knows what it takes to win in the technology business: In an earlier life, he built the successful online music service MusicNow and then sold it to Circuit City. This time, he's creating massive, secure, reliable storage for very low cost -- a more nettlesome technical problem than figuring out how to give music fans access to their favorite tunes. Cleversafe, for the moment a noncommercial open-source project, began as Gladwin's personal quest to "store his stuff for at least 50 years." He wasn't satisfied with his options: "Today you can get very secure, very reliable storage, but it's very expensive." Looking for an alternative, he ultimately began work on the Cleversafe Dispersed Storage Project.

At its heart, Cleversafe uses IDAs (information dispersal algorithms) to break data into a user-defined number of slices. Those slices can be stored across multiple servers or drives in disparate locations, then retrieved to reconstitute the original data.

The DSG (dispersed storage grid) approach doesn't rely on the standard model of making a copy of data -- even an encrypted one. Instead, information is broken up into pieces that are useless on their own. That makes the system inherently secure, private and reliable, because if individual slices were lost, stolen or corrupted, they wouldn't reveal useful information. "You don't have to trust the reliability of any hard drive, server, facility, company or person that stores part of the dispersed data," Gladwin says.

Then there's the cost. DSGs can use commodity hardware, because the reliability of any given system is not critical. According to Gladwin, if you dispersed the data slices across 11 commodity-grade servers (the current Cleversafe default) and stipulated that any six slices will allow you to re-create your data, your average unavailability would be just one hour every million years. As you increase the number of slices, that number begins to approach zero.

InfoWorld storage guru and blogger Mario Apicella is bullish on the technology, which he believes extends and redefines the concept of storage medium. "To paraphrase Sun," says Apicella, "Cleversafe's motto could be, 'The storage medium is the network.'"

Gladwin is certainly thinking along those lines. Consider the case of the Veterans Administration or any of the scores of organizations that have lost laptops containing sensitive information.

"You can access that information without storing it on your hard drive," Gladwin says. "If you made a DSG your storage medium, as opposed to the laptop hard drive, you could reconstitute that information in memory when needed." That would certainly require some policy changes, but for that class of data, he believes, change is essential.

The primary application of Cleversafe's technology is data archiving, followed by backup. But the company has bigger plans, including a commercial version -- still open source -- sometime in the coming year. Gladwin envisions people building out private as well as public grids, where companies could sell dispersed storage as a service. Cleversafe hopes to make a business out of providing services, tools and management for both those scenarios.

And if some rabid music fan wanted to create a Cleversafe DSG to store his MP3 collection, no doubt Gladwin would approve of that as well.

This story, "Open-source project to do cheap data storage across the Web" was originally published by InfoWorld.

Copyright © 2007 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon