Big three database vendors diverge on Hadoop
Three vendors, three different paths on dealing with the open-source data architecture
Computerworld - The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.
While startups and established data warehousing vendors such as Sybase Inc. and Teradata Inc. are embracing Hadoop and its Google-developed progenitor, MapReduce, Microsoft Corp. is resisting it.
"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.
DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.
"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well."
DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.
As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop.
The additions are the result of incorporating technology from DATAllegro Inc., which Microsoft acquired, not Hadoop, DeWitt said.
He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long-running queries from crashing than SQL.
Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.
That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open-source code, which is managed by the Apache Software Foundation.
IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group.
"I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."
Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a Web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.
To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.
One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as Web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.
"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."
Oracle Database stands to lose the most if MapReduce/Hadoop takes off, critics say.
That's not just because of Oracle's longtime lead in the relational database market, but also because of its database's poor reputation for scale-out -- a MapReduce/Hadoop strength.
Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."
A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.
"MapReduce in the end is a programming construct ... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.
Read more about Databases in Computerworld's Databases Topic Center.
- 10 Hot Big Data Startups to Watch
- 11 Unique Uses for Google Glass, Demonstrated by Celebs
- How to Export Your Google Reader Account
- How to Better Engage Millennials (and Why They Aren't Really so Different)
- Telltale signs of ATM skimming
- 20 security and privacy apps for Androids and iPhones
- Big screen con artists: 7 great movies about social engineering
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- Intelligent Systems: A Prescription for Health Care Transformation Facing an onslaught of regulatory changes and market pressures, health care providers are grappling with how to transform existing services as part of...
- ESG Lab Validation of QLogic's Caching SAN Adapter ESG details the results of their testing of QLogic's new 10000 Series 8Gb Fibre Channel Adapter with a focus on scalable database performance...
- Deliver Customer Value with Big Data Analytics Big Data requires that companies adopt a different method in understanding today's consumer. Read this white paper to learn why Big Data is...
- Cloud Analytics for the Masses Learn the best practices in building applications that can leverage volume, variety and velocity of Big Data for organizations of any size.
- 3 Reasons Why Sepaton is the World's Fastest Backup Solution Leading analyst, Storage Switzerland learns how Sepaton backs up and deduplicates massive data volumes while maintaining the industry's fastest performance - all in...
- Virtustream (Vayence) video taking a 3000-Seat SAP Environment to the Cloud How can public cloud services help your organization reduce costs and increase security for your mission All Databases White Papers | Webcasts