Yahoo working on Hadoop MapReduce 2
The new version will deliver better resource management
Computerworld - The next generation of Apache Hadoop, the software implementation that allows batch processing of petabytes of data, is expected out this year, says a Yahoo executive.
Todd Papaioannou, vice president of cloud architecture at Yahoo, told Computerworld this week that current iterations of Hadoop lack the ability to effectively manage resources across thousands of servers in a cluster.
So developers are working on improving utilization, scheduling and management of resources.
For example, the new architecture will include a global ResourceManager that will tracks server availability and scheduling invariants while a per-application ApplicationMaster runs inside the cluster and tracks the program semantics for a given job, Yahoo developer Arun Murthy wrote in a blog post.
Papaioannou said Yahoo contributed about 70% of the code for the current iteration of Hadoop and the Hadoop Distributed File System (HDFS).
Earlier this year, Yahoo dropped its own distribution of Hadoop and began working more closely with the Apache Hadoop community because it allows the open source community to help with development efforts, Papaioannou said.
Along with Apache, Hadoop uses an iteration of MapReduce, a programming technique that originated at Google, for building parallel programs. Running with Hadoop, MapReduce enables it to perform parallel batch processing.
"The next generation of HDFS will be more resilient, available and reliable," Papaioannou said. "We expect to put it all together in a release some time soon. That's an exercise of collaboration with rest of the development community."
Yahoo also just launched a new project called H Catalog, which is a table metadata management schema for Hadoop.
"That will help drive different use cases," he said. "It just went into Apache version last week."
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at
@lucasmearian or subscribe to Lucas's RSS feed
. His e-mail address is lmearian@computerworld.com.
Read more about Storage Software in Computerworld's Storage Software Topic Center.
- Google I/O 2013's Coolest Products and Services
- 10 Star Trek Technologies That are Almost Here
- 19 Generations of Computer Programmers
- 25 Must-Have Technologies for SMBs
- A walking tour: 33 questions to ask about your company's security
- 15 social media scams
- The 7 elements of a successful security awareness program
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- How Storage Resource Management Suite Meets Today's Storage Management Challenges This white paper outlines the common use cases Storage Resource Management Suite addresses including comprehensive monitoring, reporting, and analysis for heterogeneous block, file,...
- Software Lifecycle Management Applications need lifecycle management too! This guide contains insights about software and managing it - from the latest trends to a strategy for...
- Software Management: Turning Chaos into Control This paper will help you understand what types of software licensing options exist and how to use software assessment management to prepare for...
- The Importance of Performance Management in Software-defined Networking Riverbed Technology and VMware have joined forces to help address these problems and make it easy to deploy and manage VXLAN overlay networks...
- Becoming An Analytics Driven Organization Join us on Tuesday, June 18, 2013, 11:00 AM EDT and learn how your agency can create an analytics culture that will enable...
- 3 Reasons Why Sepaton is the World's Fastest Backup Solution Leading analyst, Storage Switzerland learns how Sepaton backs up and deduplicates massive data volumes while maintaining the industry's fastest performance - all in... All Storage Software White Papers | Webcasts