Skip the navigation
News

Researchers: Databases still beat Google's MapReduce

Speed, efficiency of parallel SQL databases superior, paper shows

By Eric Lai
April 13, 2009 12:00 PM ET

Computerworld - A team of researchers will release on Tuesday a paper showing that parallel SQL databases perform up to 6.5 times faster than Google Inc.'s MapReduce data-crunching technology.

Google bypassed parallel databases and invented MapReduce as a way to index the World Wide Web on its global grid of low-end PC servers. As of January 2008, Google has used MapReduce to process 20 petabytes of data a day.

In results of in-house tests published last November, Google used MapReduce running on 1,000 servers to sort 1TB of data in just 68 seconds.

Such results have won MapReduce and its open-source version Hadoop many fans, who argue that the technology is already superior to the 40-year-old relational one for large-scale grids such as for cloud-computing infrastructures, and will eventually render databases obsolete for other tasks.

Microsoft technical fellow David DeWitt and Michael Stonebraker, a database industry legend and chief technology officer at Vertica Systems Inc., who co-authored the paper, have previously argued that MapReduce lacks many key features already standard to databases and was generally a "major step backward."

The paper, titled "A Comparison of Approaches to Large-Scale Data Analysis," viewable here. It is sure to stoke heated discussion among data junkies over the technical merits of each approach. It will be published by the Association for Computing Machinery (ACM), a 92,000-member IT society, in the June 29-July 2 issue of its SIGMOD Record journal of data management.

In addition to DeWitt and Stonebraker, five researchers from Brown University, Yale University, MIT and the University of Wisconsin co-authored the report.

In the paper, DeWitt and Stonebraker put meat on their argument by testing two 100-node parallel, "shared-nothing" database clusters, one running the column-based Vertica and another running a row-based database from "a major relational vendor," against a similarly configured MapReduce one of the same size. Servers had 2.4-GHz Intel Core 2 Duo processors running 64-bit Red Hat Enterprise Linux with 4GB of RAM and two 250GB SATA-I hard drives all connected by Gigabit Ethernet ports.

Their conclusion? Databases "were significantly faster and required less code to implement each task, but took longer to tune and load the data," the researchers write. Database clusters were between 3.1 and 6.5 times faster on a "variety of analytic tasks."

MapReduce also requires developers to write features or perform tasks manually that can be done automatically by most SQL databases, they wrote.

MapReduce may be "well suited for development environments with a small number of programmers and a limited application domain," they said. "This lack of constraints, however, may not be appropriate for longer-term and larger-sized projects."



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Databases White Papers
HP Advanced Information Services for SAP In-Memory Appliance (SAP HANA)
Organizations are eager to connect the vast amounts of data available within and outside their businesses to compete more effectively and make better...
Galliker builds next-generation Cisco data center
Originally Galliker Transport AG only intended to upgrade its bandwidth to 10 gigabit per second in the core network of the data center...
Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud
This white paper describes configuration considerations, best practices and performance results of TimesTen running on Exalogic.
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
All Databases White Papers
Databases Webcasts
Oracle Database Appliance - Simplifying your High Availability Database
Date: February 29, 2012
Time: 1:00 PM EST

Seasoned IT managers know from experience that in many cases the bulk of the cost of an...
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
All Databases Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs