Storage 2.0 -- Web-based storage is coming

Backup and archiving are the killer apps for now on open-source platforms

Combine open-source software, distributed storage running on low-cost hardware and the World Wide Web, and what do you get? Storage for as little as 15 cents per gigabyte per month, and another 10 to 20 cents for each gigabyte users upload or download.

That's a pretty good deal, especially when Andrew Reichman, an analyst at Forrester Research Inc., estimates that it costs $15 to $25 per gigabyte just to buy the hardware and software needed for secondary (backup or archival) storage, and $50 and up per gigabyte for the primary storage needed for business-critical applications such as stock trading or airline reservations. Neither of these prices take into account ongoing management costs.

But don't throw away your Fibre Channel storage-area network (SAN) yet. These Web-based services lack the performance required for online transactional applications or giant database queries. Then there's the question of security, and how much of their data companies will trust to a node somewhere in the Internet "cloud."

Still, if promising new technologies deliver, they could reduce corporate reliance on the proprietary, higher-priced, storage hardware and software sold by industry giants such as EMC Corp., IBM and Hitachi Data Systems Inc., not to mention a host of smaller players.

The technologies

The first technology enabling this new storage platform is open-source storage software. (See "Open source software takes the storage stage"). This can be in the form of tools for specific storage functions, such as the Amanda open-source backup and the Darik's Boot and Nuke (DBAN) disk-wiping utility. It also includes network file systems such as Lustre, OpenAFS and SAMBA, which can form the foundations of entire storage infrastructures.

The second technology is distributed grid- or cluster-based storage architectures from start-ups such as Cleversafe Inc. and established services such as MozyPro from Berkeley Data Systems Inc.

The third enabling technology is the use of industry-standard servers and disk drives in lieu high-end storage arrays in these architectures.

Berkeley Data Systems, for example, bases its MozyPro online backup services on its storage clustering and file serving software running on "white box" (unbranded) servers running in the Berkeley Data Systems data center that store data on their internal drives. The price: $4 per month charge for each desktop or server using the service and 50 cents per month for each gigabyte of data stored. Unlike other online storage providers that safeguard customers' data by storing multiple copies, Berkeley's software saves 33% of the original data, from which it can restore the complete original if needed. This means it must store only 33% more data than a customer sends it, compared to other storage providers who must store 300% of the original data, says Vance Checketts, vice president for products.

Cleversafe, a 29-person start-up that is alpha-testing software it will offer to other companies to build open-source, Web-based distributed storage architectures, goes further. Its software uses algorithms to split encrypted data into 11 "slices," which are stored on distributed servers and must be combined to yield any usable information. Using the same algorithms, the software can re-create the original data from any of the original slices. By eliminating the backup, archiving and restoration of entire files, Cleversafe reduces the amount of "extra" data a company must store to protect critical information from the current 300% or more of actual data to 130%, according to CEO Chris Gladwin  He also claims that the data slicing is inherently secure because no one storage node contains an entire copy of any file, making it harder to steal or corrupt. Availability is also assured because any five of the 11 nodes can fail, and the software can still recover the data, he says.

The Internet Services Inc., a Houston-based hosting firm, is investigating Cleversafe as a way to use older servers to create low-cost storage grids. "Instead of going for three years or four years, with the proper upgrades in disk drives, we could get five to six years of life out of them, and at the same time, offer storage to our customers," says Chairman and CEO Doug Erwin.

Stelios Valavanis, president and founder of Onshore Networks LLC, a Chicago-based networking consultant, thinks that the security, rather than any cost savings, offered by Cleversafe could make it attractive to his clients. Both he and The are waiting for Cleversafe to deliver new features later this year, such as further reducing the amount of "extra" code stored on a Cleversafe grid and allowing users and applications to see the grid as a network drive, before deciding how to proceed.

Perhaps the biggest online player is Inc. (see " Unveils Data Storage Service"). Adam Selipsky, vice president of product management and developer relations for Amazon Web Services, says its S3 service is provided by "multiple arrays of storage servers at multiple locations, storing multiple copies" of customers' data. It is aimed at developers who can experiment building innovative applications because of its low cost: 15 cents per month for each gigabyte of data stored, 10 cents for each gigabyte uploaded, and between 13 and 18 cents for each gigabyte downloaded. Selipsky declined to describe the technology used in S3 except to say it includes "multiple arrays of storage servers at multiple locations, storing multiple copies" of data and that Amazon "predominantly uses open-source software" throughout its infrastructure.

Move over, EMC?

John Webster, an analyst at Illuminata Inc., says the combination of open-source software and grid storage technologies could pose a real risk to vendors of copy, backup and disaster recovery software. "If this approach really works, it's a game changer" by fundamentally simplifying storage management, he says.

Some other observers, however, predict users will keep buying proprietary products for their most critical applications.

One reason is the inherent latency and unpredictability of the Internet, which a storage manager cannot tweak for rock-solid reliability and predictable response times. Security is another concern. Jeff Pieper, president of Pieper & Associates Inc., a Torrance, Calif., marketing design firm, is the type of SMB customer being courted by the online storage vendors. But he says he has to sign a multipage nondisclosure form with many of his customers and plans to keep their data on his 4TB SAN from Hitachi to be sure it's safe.

However, customers who build their own grids in-house would have control over their networks, and thus might be able to use them even for primary storage, says Webster.

Then there is the question of actual savings. Reichman says upfront costs for distributed storage are undoubtedly far lower than for in-house storage hardware, but it's still unclear how long-term management costs will compare. Gladwin says it's too early to discuss specific pricing for Cleversafe grids, but he says customers should see savings "at least proportional" to the reduced disk space, power, floor space and management they will require.

Reichman says the major storage hardware vendors will inevitably lose some business as customers move storage from in-house hardware to Web-based providers. But he says vendors that also sell servers could "make up some of the revenue" by selling low-priced servers and other "building blocks for the grid."

Valavanis believes grid-based storage could even be a boost for those vendors. "Even though Cleversafe allows you to use less expensive hardware, the reality is that big companies building grids in their IT departments will not tolerate buying cheap disks. The corporations who are buying EMC now and want to build on a grid model, who are they going to buy their disks from?" he asks.

Like other online vendors, Berkeley Data Systems founder and CEO Josh Coates sees MozyPro replacing tape-based backup more frequently than high-end disk. He says customers are abandoning tape systems "like hot rocks" because they are slower, less reliable and more complicated than online storage services offered by Berkeley and competitors such as Carbonite Inc.

Even Gladwin sees Cleversafe as a complement to, rather than a replacement for current storage offerings. While backup is built into a Cleversafe grid by virtue of how data is stored, he still expects many customers to continue to do, for example, snapshots to capture the state of their data at a given point in time.

Reichman predicted that small to midsize businesses will likely be first to use such services, to avoid the "tremendously difficult" job of managing their own storage. As these new technologies are proven, Reichman sees larger companies moving more secondary storage to such third-party vendors. Others may adopt such technologies internally, he says, allowing them to reap the cost savings while maintaining control over their own storage. Some banks are already evaluating such a move, he says.

Amazon's Selipsky argues there's a place for Amazon S3 in the enterprise because, like smaller organizations, they "want very simple, very easy to interact with, very easy to integrate, highly reliable services." He also says many departments or groups within large companies lack the budget or organizational ability to fund large infrastructure projects, but "might have $500 or $5,000 or $50,000 to mess with during a quarter, to prove a concept, to try something out."

No rush

Any move to grid storage won't happen overnight, nor does it have to. A dramatically new approach such as Cleversafe's needs evangelization, says Valavanis, as well as "understanding the technology to a certain extent."

Gladwin also points out that "IT organizations generally replace hardware every four years or so. If someone just bought a brand new architecture ... they're not going to scrap is six months later," he says. In two to three years, though, Gladwin expects "that distributed architectures will become often used for large data archival applications."

By that time the pioneers, on both the customer and the developer side, will have a much better idea of how big a storage revolution they have on their hands.

Storage 2.0: How Big a Deal?

What It Is: Storage services delivered over the Web, using open-source software, grid or clustered storage architectures, and/or low-cost standard hardware.

Pluses: Very low cost, easy manageability, very high levels of scalability.

Minuses: Lacks the performance and reliability required for very high-end applications. Security will be a concern for some customers.

Bottom Line: Worth evaluating at least for secondary applications such as backup and archiving. May also be worth deploying inside the corporate firewall.

Robert L. Scheier is a former technology editor at Computerworld and freelance writer based in Boylston, Mass. He can be reached at

Copyright © 2007 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon