Facebook moves 30-petabyte Hadoop cluster to new data center
Exponential growth in data volumes prompts Facebook's largest data migration project
Computerworld - As the world's largest social network, Facebook accumulates more data in a single day than many good size companies generate in a year.
Facebook stores much of the data on its massive Hadoop cluster, which has grown exponentially in recent years.
Today the cluster holds a staggering 30 petabytes of data or, as Facebook puts it, about 3,000 times more information than is stored by the Library of Congress. The Facebook data store has grown by more than a third in the past year, the company notes.
To accommodate the surging data volumes, the company earlier this year launched an effort to move the ever-growing Hadoop cluster to a new and bigger Facebook data center in Prineville, Ore. The biggest data migration effort ever at Facebook was completed last month, the company said.
Paul Yang, an engineer with Facebook's data infrastructure team, outlined details of the project this week on the company's blog site. Yang said the migration to the new Facebook data center was necessary because the company had run out of available power and space leaving it unable to add nodes to the Hadoop cluster.
Yang was not immediately available to speak with Computerworld about the effort.
Facebook's experience with Hadoop is likely to be of interest to a growing number of companies that are tapping the Apache open source software to capture and analyze huge volumes of structured and unstructured data.
Much of the Hadoop's appeal lies in its ability to break up very large data sets into smaller data blocks that are then distributed across a cluster of commodity hardware systems for faster processing.
A Ventana Research report released this week showed that a growing number of enterprises have started using Hadoop to collect and analyze huge volumes of unstructured and machine-generated information, such as log and event data, search engine results, and text and multimedia content from social media sites.
Facebook said it uses Hadoop technology to capture and store billions of pieces of content generated by its members daily. The data is analyzed using the open source Apache Hive data warehousing tool set.
Other data-heavy companies using Hadoop in a similar manner include eBay, Amazon and Yahoo. Yahoo is a major contributor of Hadoop code.
Facebook's Hadoop cluster was said by bloggers in May 2010 to be the largest in the world.
At the time, the cluster consisted of 2000 machines, 800 16-core systems and 1,200 8-core machines. Each of the systems in the cluster stored between 12 and 24 terabytes of data.
Facebook had a pair of potential methods for moving the cluster to a new data center, Yang said in his post.
- Ebook: Big Data Analytics For Dummies Big Data Analytics for Dummies is a valuable resource that addresses the practical dilemmas surrounding Big Data analytics and provides a step-by-step approach...
- Enterprise architects challenged to manage data explosion Read this whitepaper to find out how Red Hat Storage Server can allow enterprises to quickly and confidently deliver business applications that minimize...
- CIOs strive to harness Big Data while keeping an eye on the bottom line Read this whitepaper to learn how Red Hat Storage Server allows CIOs to confidently support business growth, manage cost and risk, capitalize on...
- Dell PowerEdge C8000 Series Constantly changing workloads demand the right mix of resources, with the maximum performance per unit of rack space. The Dell PowerEdge C8000 with...
- Capturing Data in Motion: Delivering Real-Time Insight from Data Streams This webcast will help organizations of all types and sizes learn about a technology and business strategy for tapping into the wealth of...
- The Next Generation of Big Data: New IBM Information Management Cloud Solutions Learn about IBM's new and expanded Information Management capabilities now delivered in the cloud, including: Hadoop based analytics, stream processing, in-memory computing, data... All Business Intelligence/Analytics White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!