Yahoo working on Hadoop MapReduce 2
The new version will deliver better resource management
Todd Papaioannou, vice president of cloud architecture at Yahoo, told Computerworld this week that current iterations of Hadoop lack the ability to effectively manage resources across thousands of servers in a cluster.
So developers are working on improving utilization, scheduling and management of resources.
For example, the new architecture will include a global ResourceManager that will tracks server availability and scheduling invariants while a per-application ApplicationMaster runs inside the cluster and tracks the program semantics for a given job, Yahoo developer Arun Murthy wrote in a blog post.
Papaioannou said Yahoo contributed about 70% of the code for the current iteration of Hadoop and the Hadoop Distributed File System (HDFS).
Earlier this year, Yahoo dropped its own distribution of Hadoop and began working more closely with the Apache Hadoop community because it allows the open source community to help with development efforts, Papaioannou said.
Along with Apache, Hadoop uses an iteration of MapReduce, a programming technique that originated at Google, for building parallel programs. Running with Hadoop, MapReduce enables it to perform parallel batch processing.
"The next generation of HDFS will be more resilient, available and reliable," Papaioannou said. "We expect to put it all together in a release some time soon. That's an exercise of collaboration with rest of the development community."
Yahoo also just launched a new project called H Catalog, which is a table metadata management schema for Hadoop.
"That will help drive different use cases," he said. "It just went into Apache version last week."
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed . His e-mail address is email@example.com.
Read more about Storage Software in Computerworld's Storage Software Topic Center.
- Are You Prepared for a Software Audit? Just the word "audit" is enough to make anyone shiver, and when it comes to a software audit, the reaction is no different....
- Top Reasons for Upgrading to Emulex Gen 5 Fibre Channel HBAs With Gen 5 Fibre Channel HBAs you can run more virtual machines and applications per server, cut your HBA installation and management time...
- Securing Mobile App Data - Comparing Containers and App Wrappers Analysts agree that Mobile Device Management (MDM) is not enough when it comes to securing app data. Although it remains a critical component...
- Capabilities You Need in an IP Address Management Solution A mismanaged IP space can cripple an otherwise healthy network. Take a moment to understand what you need in an enterprise-ready IPAM solution.
- DevOps with PureApplication System: Reduce cost and speed delivery with an integrated IBM Cloud solution Join this webcast to hear what ING Netherlands has been able to achieve while deploying DevOps tools from IBM Rational. An ING executive...
- Accelerate your innovation with IBM Bluemix™ Join us for a webcast introducing the new IBM BluemixTM. IBM Bluemix (www.bluemix.net) is a developer oriented Platform as a Service (PaaS) environment... All Storage Software White Papers | Webcasts