Big three database vendors diverge on Hadoop
Three vendors, three different paths on dealing with the open-source data architecture
Computerworld - The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.
"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.
DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.
"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well."
DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.
As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop.
The additions are the result of incorporating technology from DATAllegro Inc., which Microsoft acquired, not Hadoop, DeWitt said.
He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long-running queries from crashing than SQL.
Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.
That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open-source code, which is managed by the Apache Software Foundation.
IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group.
"I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."
Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a Web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.
To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.
One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as Web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.
"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."
Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."
A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.
"MapReduce in the end is a programming construct ... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.
Read more about Databases in Computerworld's Databases Topic Center.
- Aberdeen Group: Marketing Analytics for Manufacturing: Forging Customer Insights There are no recalls for poor marketing. Manufacturers need to get their customer intelligence and messaging right the first time. Learn how.
- SANS: Next-Generation Datacenters = Next-Generation Security This whitepaper takes a look at some new technology that may allow security teams to implement more flexible and capable protection models in...
- SANS: Protecting Virtual Endpoints with McAfee Server Security Suite Essentials SANS review of McAfees Server Security Suite Essentials that address some of the emerging challenges of securing virtual platforms and cloud environments.
- Safeguarding the Next-Generation Data Center Use of virtual and cloud servers has exploded. Unfortunately, security often lags behind. McAfee recommends looking at innovative solutions in order to erect...
- Leveraging Flash Storage to Accelerate Oracle Real Application Clusters Join this webinar to understand the latest solid-state storage trends, the specific applications driving solid-state storage deployments and the benefits of deploying the...
- Is SQL Server AlwaysOn really as powerful? Tips and Tricks from the field With the introduction of AlwaysOn, Windows Clustering Services is now more critical than ever. All Databases White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!