Skip the navigation
News

Yale researchers create database-Hadoop hybrid

HadoopDB offers the data-crunching prowess of a relational database with scalability

By Eric Lai
July 21, 2009 07:19 AM ET

Computerworld - Yale University researchers have released an open-source parallel database that they say combines the data-crunching prowess of a relational database with the scalability of next-generation technologies such as Hadoop and MapReduce.

HadoopDB was announced on Monday by Yale computer science professor Daniel J. Abadi on his blog.

Abadi and his students created HadoopDB from components including the open-source database, PostgreSQL, the Apache Hadoop data-sorting technology and Hive, the internal Hadoop project created by Facebook Inc.

Queries are accepted in either MapReduce, the progenitor of Hadoop invented by Google Inc. for storing and indexing the entire World-Wide Web, or conventional SQL language.

Similarly, data processing is partly done in Hadoop and partly in "different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines," wrote Abadi.

"In essence, it is a hybrid of MapReduce and parallel DBMS technologies," he continued. But unlike already-developed projects and vendors such as Aster Data, Greenplum or Hive, HadoopDB "is not a hybrid simply at the language/interface level. It is a hybrid at a deeper, systems implementation level."

By combining the best of both approaches, HadoopDB can achieve the fault tolerance of massively parallel data infrastructures such as MapReduce, where a server failure has little effect on the overall grid. And it can perform complex analyses almost as quickly as existing commercial parallel databases, claims Abadi.

The source code for HadoopDB is available now.

Abadi's solution, while experimental, could appeal to Web 2.0 firms and other members of the burgeoning 'NoSQL' movement.

It might eventually also appeal to enterprises looking for less-expensive, more scalable alternatives to Oracle's Database, IBM's DB2 or Microsoft's SQL Server.

Abadi was one of the co-authors of a research paper released in April that found that for most users and applications, relational databases still beat MapReduce and Hadoop.

In an e-mail, Abadi said that his current research doesn't repudiate the previous paper, but comes to the strong conclusion that as databases continue to grow, systems such as HadoopDB will "scale much better than parallel databases."

Though built with PostgreSQL, HadoopDB can use other databases for engines. Abadi's team has already successfully used MySQL, said Abadi, and plan to also try using columnar databases such as Infobright and MonetDB to improve performance on analytical workloads.

"Although at this point this code is just an academic prototype and some ease-of-use features are yet to be implemented, I hope that this code will nonetheless be useful for your structured data analysis tasks!" Abadi said.

Read more about Databases in Computerworld's Databases Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Databases White Papers
HP Advanced Information Services for SAP In-Memory Appliance (SAP HANA)
Organizations are eager to connect the vast amounts of data available within and outside their businesses to compete more effectively and make better...
Galliker builds next-generation Cisco data center
Originally Galliker Transport AG only intended to upgrade its bandwidth to 10 gigabit per second in the core network of the data center...
Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud
This white paper describes configuration considerations, best practices and performance results of TimesTen running on Exalogic.
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
All Databases White Papers
Databases Webcasts
Oracle Database Appliance - Simplifying your High Availability Database
Date: February 29, 2012
Time: 1:00 PM EST

Seasoned IT managers know from experience that in many cases the bulk of the cost of an...
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
All Databases Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs