Hortonworks releases its Hadoop version
For its first Hadoop release, Hortonworks focused on making the data analysis software easy to deploy and monitor
IDG News Service - For the first production release of what will be its flagship Apache Hadoop distribution, Hortonworks has focused on providing a set of tools to help deploy, manage and extend the data analysis platform.
"Hortonworks' goal is to make Hadoop easy to use and consume," said John Kreisa, Hortonworks vice president of marketing.
Version 1 of the Hortonworks Data Platform (HDP), to be released June 15, will be Hortonworks' first production-ready product release. Hortonworks was set up a year ago by Yahoo, along with Benchmark Capital, to provide enterprise support for Hadoop, the large-scale data analysis platform. Yahoo played a pivotal role in the early development of Hadoop.
Hortonworks now competes with a number of other companies also offering support packages, including Cloudera, MapR and IBM. Microsoft has chosen Hortonworks' Hadoop distribution for use on its Azure cloud service, though that service, promised by the end of 2011, has not debuted yet.
Like other commercial Hadoop packages, HDP packages a number of different open-source Hadoop components, including the latest versions of the Pig scripting engine, the Hive data warehousing software and the HBase database.
In addition to these basic components, Hortonworks added a number of additional management and interoperability tools to the package, all of them based on open-source projects as well.
To aid in management, the package includes a customized version of Apache Ambari, a Hadoop monitoring and lifecycle management program. With this software, an administrator can set up a single Hadoop instance across a number of servers. Once Hadoop is installed, the software then monitors performance of the servers as well as the Hadoop jobs themselves, presenting the data on a dashboard.
"The dashboards are customizable and the APIs [application programming interfaces] allow the management and monitoring functionality to be tied into third-party dashboards like Hewlett-Packard's OpenView or Teradata's Viewpoint," Kreisa said.
With this release, the management tools will only be able to manage a single cluster, though future versions may be able to manage multiple clusters, said Ari Zilka, Hortonworks chief products officer. Specific metrics that are being captured include network utilization, throughput and latency, and usage of CPUs, memory and disks. Jobs in Hadoop are also measured, including the time it takes for a task to start, how many tasks there are on backlog, how many data blocks a task uses and where these data blocks are located.
For data interoperability, the package includes a metadata catalogue that should make it easier for business intelligence and other data analysis products to query Hadoop datasets. Based on Apache HCatalog, this metadata repository provides pointers to Hadoop data in a set of tables that can be easily queried by tools commonly used for relational databases, enterprise data warehouses and other structured data systems.
- Software Asset Management: Ensuring Today's Assets Today's trends like BYOD and SaaS are new and exciting in terms of how they will help make our jobs more productive but...
- Trends Shaping Software Management: 2014 Most IT executives recognize the relationship between mobile computing and worker productivity, and have long issued notebook computers and other mobile devices to...
- Software Asset Management: Pay Attention or Pay Up There is a wide range of options for managing software assets, from in-house solutions to the cloud to managed services providers. Read this...
- 13 Reasons to Move to Adobe Creative Cloud One of the big advantages Adobe Creative Cloud for teams offers over Adobe Creative Suite 6 perpetual software is the ability to continually...
- Capturing Data in Motion: Delivering Real-Time Insight from Data Streams This webcast will help organizations of all types and sizes learn about a technology and business strategy for tapping into the wealth of...
- The Next Generation of Big Data: New IBM Information Management Cloud Solutions Learn about IBM's new and expanded Information Management capabilities now delivered in the cloud, including: Hadoop based analytics, stream processing, in-memory computing, data... All Business Intelligence/Analytics White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!