Hortonworks previews next gen Apache Hadoop
Hadoop 2.0 moves beyond batch processing; offers a foundation for interactive queries and real time analysis
IDG News Service - Hortonworks has released a preview distribution of the next generation of Apache Hadoop, one that promises to broaden the scope of the kinds of analysis that can be carried out on the data processing platform.
"Hadoop 2.0 is truly a fundamental architecture change, one that makes Hadoop significantly more than just a batch platform," said Arun Murthy, a founder of Hortonworks, and one of the core engineers developing Hadoop. The update "will fuel a whole new wave of innovation," he said.
The Hortonworks Data Platform 2.0 Community Preview contains a number of new components for the Hadoop environment, most notably YARN (Yet Another Resource Negotiator), a successor to Hadoop's MapReduce job scheduler.
Hadoop started as a "single application platform," one primarily built for crawling and indexing Web content, Murthy said. Organizations are now looking to use it for other kinds of jobs, such as interactive querying or analysis of real time streams of data.
YARN improves on MapReduce by expanding the types of jobs that can be done on a Hadoop platform. MapReduce pretty much could only manage batch processing jobs, executing data analysis across any number of nodes and returning the results when it has completed.
In contrast, YARN is a general-purpose resource management framework. It provides a foundation to run nonbatch processing jobs, such as those that run indefinitely on live streams of data, and those that involve interactive queries, in which users interrogate the data on the fly. "You can now have both the batch MapReduce jobs and interactive SQL queries running right next to each other in YARN," Murthy said.
Using YARN, "you have a cluster that is aware of all the different types of workloads and resource needs, so they can all cohabitate. You don't get one workload dominating or taking over all the resources of the cluster," said Shaun Connolly, Hortonworks vice president of corporate strategy for Hortonworks. Previously, organizations would have to run separate clusters to execute different styles of jobs.
HDP 2.0 includes a number of other new components as well, including the Apache Tez, an add-on to YARN for speeding large, interactive jobs, and Stinger, a collection of technologies that provides the ability to run SQL queries against a Hadoop repository.
This preview of HDP 2.0, a full Hadoop distribution, runs in either the Oracle VirtualBox or the VMware virtual environments.
Hortonworks announced HDP 2.0 at the 2013 Hadoop Summit, being held this week in San Jose, California. Also at the conference, Rackspace announced it would offer Hadoop as a service, with analysis tools from Pentaho. Splunk released a new tool, called Hunk to explore Hadoop repositories. Data warehouse systems provider Teradata unveiled new Hadoop appliances. And VMware updated its vSphere virtualization management software to support Hadoop clusters.
- Best iPhone, iPad Business Apps for 2014
- 14 Tech Conventions You Should Attend in 2014
- 10 Desktop Apps to Power Your Windows PC
- How to Add New Job Skills Without Going Back to School
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Is Your Big Data Solution Production-Ready? Read "Is Your Big Data Solution Production-Ready?" now, and discover best practices and actionable steps to implementing a production-ready big data solution.
- Pay-as-you-Grow Data Protection: IBM Tivoli's Full-featured Data Protection Suite for Small to Medium Businesses IBM Tivoli Storage Manager Suite for Unified Recovery gives small and medium businesses the opportunity to start out with only the individual solutions...
- Simplify and Consolidate Data Protection for Better Business Results Learn about IBM® Tivoli® Storage Manager Operations Center, which provides advanced visualization, built-in analytics and integrated workflow automation features that leapfrog traditional backup...
- Smarter Environmental Analytics Solutions: Offshore Oil and Gas Installations Example This IBM Redbooks® Solution Guide describes a solution for implementing smarter environmental monitoring and analytics for oil and gas industries. The solution implements...
- Webinar: Building a Big Data solution that's production-ready Big data solutions are no longer just a nice-to-have.
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well... All Big Data White Papers | Webcasts