Yale researchers create database-Hadoop hybrid
HadoopDB offers the data-crunching prowess of a relational database with scalability
Computerworld - Yale University researchers have released an open-source parallel database that they say combines the data-crunching prowess of a relational database with the scalability of next-generation technologies such as Hadoop and MapReduce.
HadoopDB was announced on Monday by Yale computer science professor Daniel J. Abadi on his blog.
Abadi and his students created HadoopDB from components including the open-source database, PostgreSQL, the Apache Hadoop data-sorting technology and Hive, the internal Hadoop project created by Facebook Inc.
Queries are accepted in either MapReduce, the progenitor of Hadoop invented by Google Inc. for storing and indexing the entire World-Wide Web, or conventional SQL language.
Similarly, data processing is partly done in Hadoop and partly in "different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines," wrote Abadi.
"In essence, it is a hybrid of MapReduce and parallel DBMS technologies," he continued. But unlike already-developed projects and vendors such as Aster Data, Greenplum or Hive, HadoopDB "is not a hybrid simply at the language/interface level. It is a hybrid at a deeper, systems implementation level."
By combining the best of both approaches, HadoopDB can achieve the fault tolerance of massively parallel data infrastructures such as MapReduce, where a server failure has little effect on the overall grid. And it can perform complex analyses almost as quickly as existing commercial parallel databases, claims Abadi.
The source code for HadoopDB is available now.
Abadi's solution, while experimental, could appeal to Web 2.0 firms and other members of the burgeoning 'NoSQL' movement.
It might eventually also appeal to enterprises looking for less-expensive, more scalable alternatives to Oracle's Database, IBM's DB2 or Microsoft's SQL Server.
Abadi was one of the co-authors of a research paper released in April that found that for most users and applications, relational databases still beat MapReduce and Hadoop.
In an e-mail, Abadi said that his current research doesn't repudiate the previous paper, but comes to the strong conclusion that as databases continue to grow, systems such as HadoopDB will "scale much better than parallel databases."
Though built with PostgreSQL, HadoopDB can use other databases for engines. Abadi's team has already successfully used MySQL, said Abadi, and plan to also try using columnar databases such as Infobright and MonetDB to improve performance on analytical workloads.
"Although at this point this code is just an academic prototype and some ease-of-use features are yet to be implemented, I hope that this code will nonetheless be useful for your structured data analysis tasks!" Abadi said.
Read more about Databases in Computerworld's Databases Topic Center.
- A Modern Approach to the Data Deluge Read this whitepaper to learn how infrastructure leaders can confidently manage cost, complexity, and risk, capitalize on emerging technologies, and drive future innovation...
- Aberdeen Group: Marketing Analytics for Manufacturing: Forging Customer Insights There are no recalls for poor marketing. Manufacturers need to get their customer intelligence and messaging right the first time. Learn how.
- Considerations For Effective Software License Management For many reasons, software license management has become a critical issue for many IT organizations and enterprise's alike. With many licensing options, hurdles...
- eBay uses 100% OpenSource WSO2 ESB to process more than 1Billion transactions a day Along with eBay's success comes a huge demand to ensure reliable, 24x7 availability of the services that enable these transactions. For eBay, that...
- Leveraging Flash Storage to Accelerate Oracle Real Application Clusters Join this webinar to understand the latest solid-state storage trends, the specific applications driving solid-state storage deployments and the benefits of deploying the...
- It's not too late...Get Your Mobile Questions Answered Live! How can IT provide seamless and secure mobile communications and collaboration for all? Join this live Webcast as IDG asks an expert panel... All Databases White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!