Big three database vendors diverge on Hadoop
Three vendors, three different paths on dealing with the open-source data architecture
Computerworld - The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.
"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.
DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.
"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well."
DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.
As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop.
The additions are the result of incorporating technology from DATAllegro Inc., which Microsoft acquired, not Hadoop, DeWitt said.
He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long-running queries from crashing than SQL.
Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.
That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open-source code, which is managed by the Apache Software Foundation.
IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group.
"I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."
Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a Web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.
To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.
One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as Web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.
"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."
Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."
A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.
"MapReduce in the end is a programming construct ... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.
Read more about Databases in Computerworld's Databases Topic Center.
- Patient Portals: A Platform for Connecting Communities of Care Connecting patient health data across the care continuum is essential to achieve improved care, increased access to personal health records and lowered costs.
- Aberdeen Group: Marketing Analytics for Manufacturing: Forging Customer Insights There are no recalls for poor marketing. Manufacturers need to get their customer intelligence and messaging right the first time. Learn how.
- Path Selection Infographic Path Selection Infographic
- Hyperconvergence Infographic A wide range of observers agree that data centers are now entering an era of "hyperconvergence" that will raise network traffic levels faster...
- Cloud Knowledge Vault Learn how your organization can benefit from the scalability, flexibility, and performance that the cloud offers through the short videos and other resources...
- LIVE EVENT: 5/7, The End of Data Protection As We Know It. Introducing a Next Generation Data Protection Architecture. Traditional backup is going away, but where does this leave end-users? All Databases White Papers | Webcasts