Google claims MapReduce sets data-sorting record, topping Yahoo, conventional databases
Computerworld - Google Inc. late last week claimed that results of in-house data-sorting tests bolster its claims that its MapReduce technology can manipulate more data faster than any conventional database.
According to a Friday afternoon blog post by Grzegorz Czajkowski, a member of Google's systems infrastructure team, MapReduce recently sorted 1 terabyte of data in 68 seconds, or about a third of the time Yahoo Inc. achieved this summer.
Sorting or rearranging data is one of the most basic functions of a spreadsheet, database or other data-manipulation software.
Google used 1,000 servers running MapReduce in parallel to sort the data, versus 910 for Yahoo, according to Czajowksi.
Google also tested MapReduce's ability to sort 1 petabyte, or 1,000 TB, of data. That is equivalent to 12 times the amount of archived Web data in the U.S. Library of Congress as of May 2008, according to Google.
Using 4,000 servers, which is likely a small fraction of Google's entire worldwide server infrastructure, MapReduce took 6 hours, 2 minutes to sort 1PB, according to Czajkowski.
"We're not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly," he wrote.
Czajkowski did not say when the tests were done. He did reveal that as of early January this year, Google was processing an average of 20 PB total per day.
Google's announcement appeared to be deliberately timed to coincide with a speech by a noted database expert and MapReduce critic David DeWitt.
A former computer science professor at the University of Wisconsin, Madison, DeWitt joined Microsoft this spring to run a new research lab being created on the Madison campus.
The lab will focus on helping Microsoft's SQL Server "scale out" in order to run on hundreds or thousands of servers at a time. That will allow customers to run parallel database clusters similar technically to Google's, though nowhere near the latter's scale.
Early this year, DeWitt, along with database industry legend Michael Stonebraker, co-wrote a blog arguing that MapReduce was a "sub-optimal ... not novel" type of database that lacked many features that modern database administrators and developers take for granted and that was unworthy of the hype it has received.
In an interview last week with Computerworld, DeWitt praised MapReduce's scalability and hardiness.
But DeWitt also stood firm on MapReduce's shortcomings. He and Stonebraker are also submitting a paper to the Association of Computing Machinery (ACM) that compares the performance of several databases, IBM's DB2 and Stonebraker's Vertica, with MapReduce and another similar nonrelational data engine, Apache Hadoop. That paper may be publicly available as early as late January, said DeWitt.
DeWitt gave a keynote speech on Friday at the Professional Assocation for SQL Server's (PASS) conference in Seattle.
He did not directly criticize MapReduce during his PASS keynote speech, according to blog reports.
Read more about Databases in Computerworld's Databases Topic Center.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Aberdeen Group: Marketing Analytics for Manufacturing: Forging Customer Insights There are no recalls for poor marketing. Manufacturers need to get their customer intelligence and messaging right the first time. Learn how.
- SIEM: Keeping Pace with Big Security Data Learn how SIEM can have the right database back-end and offer security intelligence that leverages contextual data to achieve a strong security posture...
- The value of smarter oil and gas fields With global energy requirements continuing to rise, the exploration, development and production of new oil and gas resources are shifting to increasingly challenging...
- Smarter Environmental Analytics Solutions: Offshore Oil and Gas Installations Example This IBM Redbooks® Solution Guide describes a solution for implementing smarter environmental monitoring and analytics for oil and gas industries. The solution implements...
- The New Way to Work Knowledge Vault This Knowledge Vault focuses on how, in today's increasingly virtual world, it's more important than ever to engage deeply with employees, suppliers, partners,...
- Getting Ready for BlackBerry Enterprise Service 10.2 Find out how BlackBerry® Enterprise Service 10 helps organizations address the full spectrum of EMM challenges, while balancing the needs of both the... All Databases White Papers | Webcasts