Ads by TechWords

See your link here
Receive the latest technology news and information.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

Yale researchers create database-Hadoop hybrid

HadoopDB offers the data-crunching prowess of a relational database with scalability

July 21, 2009 07:19 AM ET

Computerworld - Yale University researchers have released an open-source parallel database that they say combines the data-crunching prowess of a relational database with the scalability of next-generation technologies such as Hadoop and MapReduce.

HadoopDB was announced on Monday by Yale computer science professor Daniel J. Abadi on his blog.

Abadi and his students created HadoopDB from components including the open-source database, PostgreSQL, the Apache Hadoop data-sorting technology and Hive, the internal Hadoop project created by Facebook Inc.

Queries are accepted in either MapReduce, the progenitor of Hadoop invented by Google Inc. for storing and indexing the entire World-Wide Web, or conventional SQL language.

Similarly, data processing is partly done in Hadoop and partly in "different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines," wrote Abadi.

"In essence, it is a hybrid of MapReduce and parallel DBMS technologies," he continued. But unlike already-developed projects and vendors such as Aster Data, Greenplum or Hive, HadoopDB "is not a hybrid simply at the language/interface level. It is a hybrid at a deeper, systems implementation level."

By combining the best of both approaches, HadoopDB can achieve the fault tolerance of massively parallel data infrastructures such as MapReduce, where a server failure has little effect on the overall grid. And it can perform complex analyses almost as quickly as existing commercial parallel databases, claims Abadi.

The source code for HadoopDB is available now.

Abadi's solution, while experimental, could appeal to Web 2.0 firms and other members of the burgeoning 'NoSQL' movement.

It might eventually also appeal to enterprises looking for less-expensive, more scalable alternatives to Oracle's Database, IBM's DB2 or Microsoft's SQL Server.

Abadi was one of the co-authors of a research paper released in April that found that for most users and applications, relational databases still beat MapReduce and Hadoop.

In an e-mail, Abadi said that his current research doesn't repudiate the previous paper, but comes to the strong conclusion that as databases continue to grow, systems such as HadoopDB will "scale much better than parallel databases."

Though built with PostgreSQL, HadoopDB can use other databases for engines. Abadi's team has already successfully used MySQL, said Abadi, and plan to also try using columnar databases such as Infobright and MonetDB to improve performance on analytical workloads.

"Although at this point this code is just an academic prototype and some ease-of-use features are yet to be implemented, I hope that this code will nonetheless be useful for your structured data analysis tasks!" Abadi said.

Read more about databases in Computerworld's Databases Knowledge Center.



Jump to comments

Eric Lai

Additional Resources

Microsoft
Here are some of the key reasons why you would want to run Unified Access Gateway with DirectAccess.
Microsoft
Review how one energy firm tightened protection and simplified IT work using business-ready security solutions.
Sybase
In this white paper, IDC analyzes the role of next-generation mobile enterprise platforms as organizations seek a more strategic deployment of mobile solutions.

Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.

What People Are Saying

IT Jobs

 

SAS Information Management Kit

SAS is the leader in business intelligence and analytical software and services. Only SAS offers leading data integration, storage, analytics and business intelligence applications within a comprehensive enterprise intelligence platform. SAS gives 97 of the top 100 companies in the 2007 Fortune 500 THE POWER TO KNOW®.

Webcast: The Information Management Roadmap
Imagine high-quality data, cleansed, analyzed and delivered throughout your organization. Join Computerworld, IT visionary Thornton May and a panel of experts to learn how SAS® can help you make it happen.

View this webcast 
Research Report: Information Management Initiatives at Midsize and Large Organizations
See the top-line results of this Computerworld sponsored survey to see how IT and business leaders are handling information management implementation.

Download this report 
White Paper: Information Management: Better Information for Winning Decisions.
This white paper explains how the SAS Information Evolution Model aids companies in assessing how they use this information to make strategic decisions and drive business.

Download this white paper