Skip the navigation
News

Big three database vendors diverge on Hadoop

Three vendors, three different paths on dealing with the open-source data architecture

By Eric Lai
December 16, 2009 07:49 PM ET

Computerworld - The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.

While startups and established data warehousing vendors such as Sybase Inc. and Teradata Inc. are embracing Hadoop and its Google-developed progenitor, MapReduce, Microsoft Corp. is resisting it.

"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.

DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.

"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well."

DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.

As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop.

The additions are the result of incorporating technology from DATAllegro Inc., which Microsoft acquired, not Hadoop, DeWitt said.

He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long-running queries from crashing than SQL.

Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.

That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open-source code, which is managed by the Apache Software Foundation.

IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group.

"I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."

Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a Web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.

To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.

One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as Web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.

"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."

Oracle Database stands to lose the most if MapReduce/Hadoop takes off, critics say.

That's not just because of Oracle's longtime lead in the relational database market, but also because of its database's poor reputation for scale-out -- a MapReduce/Hadoop strength.

Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."

A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.

"MapReduce in the end is a programming construct ... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.

Read more about Databases in Computerworld's Databases Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Databases White Papers
HP Advanced Information Services for SAP In-Memory Appliance (SAP HANA)
Organizations are eager to connect the vast amounts of data available within and outside their businesses to compete more effectively and make better...
Galliker builds next-generation Cisco data center
Originally Galliker Transport AG only intended to upgrade its bandwidth to 10 gigabit per second in the core network of the data center...
Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud
This white paper describes configuration considerations, best practices and performance results of TimesTen running on Exalogic.
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
All Databases White Papers
Databases Webcasts
Oracle Database Appliance - Simplifying your High Availability Database
Date: February 29, 2012
Time: 1:00 PM EST

Seasoned IT managers know from experience that in many cases the bulk of the cost of an...
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
All Databases Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs