Facebook to use 'cold storage' to deal with vast amounts of data
Facebook rethinks its data center storage plans to reduce costs while effectively storing billions of photos
IDG News Service - Facebook is rethinking the way it stores data to cope with the seven petabytes of new photos the social network's users upload every month.
As the number of photos grows, Facebook needs to find cheaper, less power-hungry ways to store them all, according to the company's vice president of infrastructure engineering.
Users upload about 300 million photos a day, more on special occasions, Facebook's Jay Parikh told the Structure Europe conference in Amsterdam on Wednesday. "Halloween is one of our biggest photo upload days of the year. We will get somewhere between probably 1 and 2 billion photos uploaded just in a single day," he said.
Photos like the ones taken at Halloween soon lose their interest, with no one looking at them after a few days or weeks, but "our contract with our users is that we can't delete the data when it is not accessed, we have to keep it," he said. That led to the idea of putting the photos into a sort of "cold storage," Parikh said. To do that, Facebook plans to build a new data center with different types of storage, server hardware and network equipment that consumes less power and costs less than existing data centers -- all without changing servers response times, said Parikh.
But how efficient can Facebook make its cold storage? When costs and power consumption in data centers are lowered, this usually happens at the expense of access speeds.
Storing data on tapes, for instance, lowers power consumption but severely slows down data access.
Amazon Web Services is following a middle path with its Glacier cloud storage service, which it pitches as an alternative to tape. The service is optimized for data that is infrequently accessed and for which retrieval times of several hours are acceptable.
That's much too slow for Facebook, according to Parikh. "I can't have a photo that you go access from five or ten years ago, and for me to show up a banner to the user that says: 'Hey, why don't you try again in 24 hours?' It's still got to be relatively real time," he said.
Most data centers that are used today are optimized to use a lot of power to deal with tasks that need big computing power. The "cold storage" technology Facebook is thinking of is at the other extreme, said Parikh. "You need lots and lots of space but you don't need as much power," he said, adding that everything about the data center needs to be rethought to handle the problem at the scale Facebook faces.
At a high level, Facebook is working on software that will figure out how and where to store a piece of content in the infrastructure when it ages, said Parikh. "That will mean that the copies of the data will move around over time and utilize the different pieces of infrastructure that we will have optimized for the age of content." Some of the inventions in the software layer will allow Facebook to still respond quickly but to store data more cost effectively, he said.
Cold storage will be part of Facebook's infrastructure in the next year or two, he said. Facebook plans to disclose and share the parts that it thinks are relevant through the Open Compute Project, an initiative started by Facebook to apply the open-source software collaboration model to the world of data center hardware.
Loek is Amsterdam Correspondent and covers online privacy, intellectual property, open-source and online payment issues for the IDG News Service. Follow him on Twitter at @loekessers or email tips and comments to email@example.com
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Pay-as-you-Grow Data Protection: IBM Tivoli's Full-featured Data Protection Suite for Small to Medium Businesses IBM Tivoli Storage Manager Suite for Unified Recovery gives small and medium businesses the opportunity to start out with only the individual solutions...
- Streamline Data Protection with IBM Tivoli Storage Manager Operations Center IBM Tivoli Storage Manager (TSM) has been an industry-standard data protection solution for two decades. But, where most competitors focus exclusively on Backup...
- Using VM Archiving to Solve VM Sprawl This CommVault whitepaper discusses how archiving virtual machines can mitigate VM sprawl with a comprehensive approach to VM lifecycle management.
- Keep Your Network Available, Efficient and Secure Make the most of your network by working with experts who "get it." CDW and F5 have partnered to keep networks highly optimized....
- Make or Break: New Auto Products Must Go To Market On Time This Webcast quantifies the value of time to market for the auto industry and highlights how Primavera Enterprise Portfolio Management can help organizations.
- IBM Flash Webcast: Optimizing your Datacenter for Efficient Storage & ROI Register for this webcast to learn the benefits of flash storage from IBM Customer, Leonardo Irastorza of Royal Caribbean Cruise Ltd and Storage... All Data Storage White Papers | Webcasts