Really big data: The challenges of managing mountains of information

Shops that shepherd petascale amounts of data have figured out some interesting methods for getting the job done.

1 2 3 Page 3
Page 3 of 3

Overall, the automaker is moving to a "business continuance model" as opposed to a pure disaster-recovery model, he explains. Instead of having backup and offsite storage that would be available to retrieve and restore in a typical disaster-recovery scenario, "we will instead replicate both live and backed-up data to a colocation facility."

In this scenario, Tier 1 applications will be brought online almost immediately in the event of a primary site failure. Other tiers will be restored from backup data that had been replicated to the colocation facility.

Boosting speed with an appliance

The Nielsen Company, the ratings service that helps determine how long TV shows stay on the air, analyzes the audience for local shows in about 20,000 homes and tracks national shows in about 24,000 homes. After various steps -- including calculation, analysis and quality assurance -- the ratings are released to clients within about 24 hours after the initial telecast.

Scott Brown, Nielsen's senior vice president for client insights, says the data is collected in a central processing facility in Florida and some 20TB of data is then stored in Florida and in Ohio. The company uses a series of high-speed SANs and network-attached storage, mostly from EMC, although Brown declined to provide specifics.

Big Data

Much of the process of generating reports from Nielsen's data warehouses is automated, but there is manual control too. Employees can call up data about a specific report from years earlier, and managers can create custom reports about viewer data.

Fast access to viewer data is business-critical, Brown says, and for that the company uses IBM Netezza appliances for its data warehouses. Tags are automatically added to data to retrieve specific measurements details. For example, Nielsen can find out how many viewers activated surround-sound audio or whether they used a Boxee device for scheduling their shows.

"We have very granular information needs, and we sometimes want the information summarized up to a broader level -- say, for a customized study of viewer habits," says Brown.

Adapting the techniques

These organizations are proving grounds for methods of handling tremendous amounts of data. StorageIO's Schulz says other companies can mimic some of their processes, including running checksums against files, incorporating metadata and using replication to make sure data is always available.

When it comes to handling massive amounts of data, Schulz says the most important point to remember is that it's critical to use technology that matches your organization's needs, not the system that's cheapest or the one that happens to be popular at the moment.

In the end, the biggest lesson may be that while big data poses many challenges, there are many avenues to success.

John Brandon is a former IT manager at a Fortune 100 company who now writes about technology. He has written more than 2,500 articles in the past 10 years. Follow his tweets at @jmbrandonbb.

Copyright © 2011 IDG Communications, Inc.

1 2 3 Page 3
Page 3 of 3
Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon