Facebook to use 'cold storage' to deal with vast amounts of data
Facebook rethinks its data center storage plans to reduce costs while effectively storing billions of photos
IDG News Service - Facebook is rethinking the way it stores data to cope with the seven petabytes of new photos the social network's users upload every month.
As the number of photos grows, Facebook needs to find cheaper, less power-hungry ways to store them all, according to the company's vice president of infrastructure engineering.
Users upload about 300 million photos a day, more on special occasions, Facebook's Jay Parikh told the Structure Europe conference in Amsterdam on Wednesday. "Halloween is one of our biggest photo upload days of the year. We will get somewhere between probably 1 and 2 billion photos uploaded just in a single day," he said.
Photos like the ones taken at Halloween soon lose their interest, with no one looking at them after a few days or weeks, but "our contract with our users is that we can't delete the data when it is not accessed, we have to keep it," he said. That led to the idea of putting the photos into a sort of "cold storage," Parikh said. To do that, Facebook plans to build a new data center with different types of storage, server hardware and network equipment that consumes less power and costs less than existing data centers -- all without changing servers response times, said Parikh.
But how efficient can Facebook make its cold storage? When costs and power consumption in data centers are lowered, this usually happens at the expense of access speeds.
Storing data on tapes, for instance, lowers power consumption but severely slows down data access.
Amazon Web Services is following a middle path with its Glacier cloud storage service, which it pitches as an alternative to tape. The service is optimized for data that is infrequently accessed and for which retrieval times of several hours are acceptable.
That's much too slow for Facebook, according to Parikh. "I can't have a photo that you go access from five or ten years ago, and for me to show up a banner to the user that says: 'Hey, why don't you try again in 24 hours?' It's still got to be relatively real time," he said.
Most data centers that are used today are optimized to use a lot of power to deal with tasks that need big computing power. The "cold storage" technology Facebook is thinking of is at the other extreme, said Parikh. "You need lots and lots of space but you don't need as much power," he said, adding that everything about the data center needs to be rethought to handle the problem at the scale Facebook faces.
At a high level, Facebook is working on software that will figure out how and where to store a piece of content in the infrastructure when it ages, said Parikh. "That will mean that the copies of the data will move around over time and utilize the different pieces of infrastructure that we will have optimized for the age of content." Some of the inventions in the software layer will allow Facebook to still respond quickly but to store data more cost effectively, he said.
Cold storage will be part of Facebook's infrastructure in the next year or two, he said. Facebook plans to disclose and share the parts that it thinks are relevant through the Open Compute Project, an initiative started by Facebook to apply the open-source software collaboration model to the world of data center hardware.
Loek is Amsterdam Correspondent and covers online privacy, intellectual property, open-source and online payment issues for the IDG News Service. Follow him on Twitter at @loekessers or email tips and comments to firstname.lastname@example.org
- Mobile Content, Collaboration & IDC's 3rd IT Platform: The Next Frontier for the Mobile Enterprise IDC focuses this article on talks about the new IT platform. This 3rd IT Platform will be the new wave for about the...
- 2014 Magic Quadrant for Enterprise Backup Software and Integrated Appliances Enterprise backup is among the oldest, most-performed tasks for IT professionals. Gartner provides analysis and evaluation of the leading providers that offer a...
- Data Warehousing: modern ecosystems for big data & analytics Five years ago, IBM observed that the planet was becoming more instrumented, interconnected and intelligent. Some 20,000 engagements later, here's what we've learned...
- IBM PureData System for Analytics compared with Teradata This report from ITG compares the cost and time to value of IBM PureData System for Analytics with that of Teradata DW Appliance...
- The Key to Happiness: Throw out Your Data Warehouse In this webinar, Kerry Reitnauer, Director, Solution Architect at FairPoint Communications will discuss the challenges the data warehouse brought, how they migrated to...
- The Foundation You Need to Build a Better Storage Infrastructure Watch this webcast to hear how you can maximize the economics of your data center by modifying your storage footprint and power usage... All Data Storage White Papers | Webcasts