How to manage big data overload

Complex requirements and relentless demands for capacity vex storage administrators. Here's how to handle the data deluge.

1 2 3 Page 2
Page 2 of 3

In a February 2012 Aberdeen survey of 106 large companies, only 20% of the respondents said that they have a single storage management application. The average was three management applications for 3.2 storage devices.

However, many storage vendors are reluctant to have their devices managed by another vendor's product. Storage virtualization is "much more complex [and] takes more time, so it hasn't caught on like server virtualization," Csaplar says. Instead, many storage administrators are looking at cloud-type implementations for third- or fourth-tier storage to move data more easily across different infrastructures and reduce storage costs. "Some companies have done it and gotten good results, but it's not a slam dunk," he adds.

Csaplar expects to see an increase in utilization of cloud-based storage and other cloud-based computing resources in the near future as network connectivity improves, costs decline and the ability to encrypt and decrypt data in flight improves. "With the cloud, you get a monthly bill paid out of the operational budget, not a separate capital budget," he says.

Deduplication and Compression

Administrators can shrink the amount of storage needed with deduplication, which eliminates redundant data by using data compression tools that identify short repeated identical strings in individual files and store only a single copy of each.

How much can storage needs be reduced? In the Aberdeen survey, 13% of the respondents said they had reduced data by 50%, but a more likely figure for most enterprises would be a 30% to 50% reduction of highly repetitive, structured data, Csaplar says.

Storage Tiering

Once the business decides what data it wants to analyze, storage administrators can put the newest and most important data on the fastest and most reliable storage medium. As the data grows older, it can be moved to slower, cheaper storage. Systems that automate the storage tiering process are gaining ground, but they're still not widely used.

When developing storage levels, administrators must consider the storage technology, the speed of the device and the form of RAID needed to protect the data.

The standard answer to failover is replication, usually in the form of RAID arrays. But at massive scales, RAID can create more problems than it solves, says Neil Day, a senior vice president and CTO at Shutterfly, an online photo site that allows users to store an unlimited number of images at the original resolution. Storage has exceeded 30 petabytes of data.

In a traditional RAID data storage scheme, copies of each piece of data are mirrored and stored on the various disks of the array, ensuring integrity and availability. But that means a single piece of data stored and mirrored can inflate to require more than five times its size in storage. As the drives used in RAID arrays get larger -- 3-terabyte drives are very attractive from a density and power consumption perspective -- the time it takes to get a replacement for a failed drive back to full parity grows longer and longer.

1 2 3 Page 2
Page 2 of 3
Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon