How to manage big data overload


Become An Insider

Sign up now and get free access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content from the best tech brands on the Internet: CIO, CSO, Computerworld, InfoWorld, IT World and Network World Learn more.

Complex requirements and relentless demands for capacity vex storage administrators. Here's how to handle the data deluge.

It used to be only for scientists, Internet giants and the mega-social-media set -- Amazon, Twitter, Facebook, Shutterfly. But now, more and more enterprises of all kinds are aiming to gain a competitive edge by tapping into big data in hopes of unearthing the valuable information it can hold. Today, companies such as Walmart, Campbell Soup, Pfizer, Merck and convenience store chain Wawa have big plans for their big data.

Some are venturing into big data analytics to respond to customers faster, keep better track of customer information or get new products to market quicker.

"Any business in this Internet Age, if they don't do it, their competition is going to do it," says Ashish Nadkarni, a storage analyst at IDC.

In a February 2012 Aberdeen survey of 106 large companies, only 20% of the respondents said that they have a single storage management application. The average was three management applications for 3.2 storage devices.

However, many storage vendors are reluctant to have their devices managed by another vendor's product. Storage virtualization is "much more complex [and] takes more time, so it hasn't caught on like server virtualization," Csaplar says. Instead, many storage administrators are looking at cloud-type implementations for third- or fourth-tier storage to move data more easily across different infrastructures and reduce storage costs. "Some companies have done it and gotten good results, but it's not a slam dunk," he adds.

Csaplar expects to see an increase in utilization of cloud-based storage and other cloud-based computing resources in the near future as network connectivity improves, costs decline and the ability to encrypt and decrypt data in flight improves. "With the cloud, you get a monthly bill paid out of the operational budget, not a separate capital budget," he says.

Shutterfly eventually adopted erasure code technology, where a piece of data can be broken into chunks, each useless on its own, and dispersed to different disk drives or servers. At any time, the data can be fully reassembled with a fraction of the chunks, even if multiple chunks have been lost due to drive failures. In other words, you don't need to create multiple copies of data; a single instance can ensure data integrity and availability. Because erasure codes are software-based, the technology can be used with commodity hardware, bringing down the cost of scaling even more.

One of the early vendors of erasure-code-based software is Cleversafe, which has added location information to create what it calls dispersal coding, allowing users to store chunks -- or slices, as it calls them -- in geographically separate places, like multiple data centers.

Mega-Big-Data Users

To continue reading, please begin the free registration process or sign in to your Insider account by entering your email address:
How to ace the CISO interview: Be ready for the tough questions
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies