Hadoop becomes critical cog in the big data machine
As more and more companies use Hadoop to handle big data, anticipation for forthcoming Version 2.0 grows
Infoworld - Apache's Hadoop technologies are becoming critical in helping enterprises manage vast amounts of data, with users ranging from NASA to Twitter to Netflix increasing their reliance on the open source distributed computing platform.
Hadoop has gathered momentum as a mechanism for dealing with the concept of big data, in which enterprises seek to derive value from the rapidly growing amounts of data in their computer systems. Recognizing Hadoop's potential, users are both using the existing Hadoop platform technologies and developing their own technologies to complement the Hadoop stack.
[ Facebook has tackled Hadoop's "Achilles' heel": the reliance on a single name server to coordinate operations. | Get up to speed on big data with InfoWorld's primer. | Subscribe to InfoWorld's Data Explosion newsletter for the best practices in managing data growth. ]
Hadoop's corporate usage now and in the futureNASA expects Hadoop to handle large data loads in projects such as its Square Kilometer Array sky-imaging effort, which will churn out 700TBps when built in the next decade. The data systems will include Hadoop, as well as technologies such as Apache OODT (Object Oriented Data Technology), to cope with the massive data loads, says Chris Mattmann, a senior computer scientist at NASA.
Twitter is a big user of Hadoop. "All of the relevance products [offering personalized recommendations to users] have some interaction with Hadoop," says Oscar Boykin, a Twitter data scientist. The company has been using Hadoop for about four years and has even developed Scalding, a Scala library intended to make it easy to write Hadoop MapReduce jobs; it is built on top of the Cascading Java library, which is designed to abstract away Hadoop's complexity.
Hadoop subprojects include MapReduce, which is a software framework for processing large set sets on compute clusters; HDFS (Hadoop Distributed File System), which provides high-throughput access to application data; and Common, which offers utilities to support other Hadoop subprojects. Movie rental service Netflix has begun using Apache ZooKeeper, a Hadoop-related technology for configuration management. "We use it for all kinds of things: distributed locks, some queuing, and leader election" for prioritizing service activity, says Jordan Zimmerman, a senior platform engineer at Netflix. "We open-sourced a client for ZooKeeper that I wrote called Curator"; the client serves as a library for developers to connect to ZooKeeper.
The Tagged social network is using Hadoop technology for data analytics, processing about half a terabyte of new data daily, says Rich McKinley, Tagged's senior data engineer. Hadoop is being applied to on tasks beyond the capacity of its Greenplum database, which is still in use at Tagged: "We're looking toward doing more with Hadoop just for scale."
- Google I/O 2013's Coolest Products and Services
- 10 Star Trek Technologies That are Almost Here
- 19 Generations of Computer Programmers
- 25 Must-Have Technologies for SMBs
- A walking tour: 33 questions to ask about your company's security
- 15 social media scams
- The 7 elements of a successful security awareness program
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- Intelligent Systems: A Prescription for Health Care Transformation Facing an onslaught of regulatory changes and market pressures, health care providers are grappling with how to transform existing services as part of...
- Agile Computing: The Path to Strategic Agility Financial institutions globally are facing increasing regulatory requirements while operating in a more competitive environment. Learn how to leverage technology to transform your...
- Time Savings and Ease of Deployment Comparison Study - Database Appliance vs Microsoft SQL Server As the amount and importance of corporate data grows, companies of all sizes are finding that they increasingly need to deploy high-availability database...
- Protection for Every Enterprise: How BlackBerry 10 Security Works Get an IT-level review of BlackBerry® 10 Security, addressing data leakage protection, certified encryption, containerization and much more.
- Oracle Database Appliance Best Practices Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited...
-
Oracle Database Appliance - Simplifying your High Availability Database
Date: February 29, 2012
Time: 1:00 PM EST
Seasoned IT managers know from experience that in many cases the bulk of the cost of an...
All Databases White Papers |
Webcasts