Hortonworks previews next gen Apache Hadoop
Hadoop 2.0 moves beyond batch processing; offers a foundation for interactive queries and real time analysis
IDG News Service - Hortonworks has released a preview distribution of the next generation of Apache Hadoop, one that promises to broaden the scope of the kinds of analysis that can be carried out on the data processing platform.
"Hadoop 2.0 is truly a fundamental architecture change, one that makes Hadoop significantly more than just a batch platform," said Arun Murthy, a founder of Hortonworks, and one of the core engineers developing Hadoop. The update "will fuel a whole new wave of innovation," he said.
The Hortonworks Data Platform 2.0 Community Preview contains a number of new components for the Hadoop environment, most notably YARN (Yet Another Resource Negotiator), a successor to Hadoop's MapReduce job scheduler.
Hadoop started as a "single application platform," one primarily built for crawling and indexing Web content, Murthy said. Organizations are now looking to use it for other kinds of jobs, such as interactive querying or analysis of real time streams of data.
YARN improves on MapReduce by expanding the types of jobs that can be done on a Hadoop platform. MapReduce pretty much could only manage batch processing jobs, executing data analysis across any number of nodes and returning the results when it has completed.
In contrast, YARN is a general-purpose resource management framework. It provides a foundation to run nonbatch processing jobs, such as those that run indefinitely on live streams of data, and those that involve interactive queries, in which users interrogate the data on the fly. "You can now have both the batch MapReduce jobs and interactive SQL queries running right next to each other in YARN," Murthy said.
Using YARN, "you have a cluster that is aware of all the different types of workloads and resource needs, so they can all cohabitate. You don't get one workload dominating or taking over all the resources of the cluster," said Shaun Connolly, Hortonworks vice president of corporate strategy for Hortonworks. Previously, organizations would have to run separate clusters to execute different styles of jobs.
HDP 2.0 includes a number of other new components as well, including the Apache Tez, an add-on to YARN for speeding large, interactive jobs, and Stinger, a collection of technologies that provides the ability to run SQL queries against a Hadoop repository.
This preview of HDP 2.0, a full Hadoop distribution, runs in either the Oracle VirtualBox or the VMware virtual environments.
Hortonworks announced HDP 2.0 at the 2013 Hadoop Summit, being held this week in San Jose, California. Also at the conference, Rackspace announced it would offer Hadoop as a service, with analysis tools from Pentaho. Splunk released a new tool, called Hunk to explore Hadoop repositories. Data warehouse systems provider Teradata unveiled new Hadoop appliances. And VMware updated its vSphere virtualization management software to support Hadoop clusters.
- 10 Mistakes to Avoid When Launching Your DG Program From failing to define data governance, to premature launch, or expecting too much from a sponsor, this white paper explains ten common mistakes...
- A Non-Geek's Big Data Playbook A visual playbook for the non-geek yet technically savvy business professional who is still trying to understand how big data impacts the enterprise...
- Top 3 Myths about Big Data Security : Debunking common misconceptions about big data security Big data represents massive business possibilities and competitive advantage for organizations that are able to harness and use that information. But how are...
- Magic Quadrant for Data Masking Technology IBM is a leader in Gartner Inc's Magic Quadrant for Data Masking Technology. Read the full report to learn about IBM.
- Live Webcast Charting Your Analytical Future - "Making predictive analytics part of your business processes" Webinar This session will show how predictive analytics can be used throughout the organization by anyone looking for answers and how organizations can make...
- Charting Your Analytical Future - "Making predictive analytics part of your business processes" Webinar This session will show how predictive analytics can be used throughout the organization by anyone looking for answers and how organizations can make...
- Improved Data-centric Application Development and Hadoop Operations with BMC and Hortonworks Join this webinar to hear from BMC and Hortonworks how their combined solutions help customers unlock the value of Big Data by implementing... All Big Data White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!