How Hadoop startup Cloudera is evolving

Cloudera Inc. is tweaking its business model.

The company started life as the Red Hat for Hadoop -- a provider of paid support for the open-source data management platform.

Last fall, the Burlingame, Calif. startup released its first product -- Cloudera Desktop, a management console.

Since then, it has also quietly released a proprietary data integration app. It "doesn't replace an Informatica or Ab Initio," says Cloudera CEO Mike Olson, but it does provide extract and transform features.

The data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform. No price has been determined yet, said Olson.

It's only one of the capabilities that Cloudera is feverishly working on -- analytics and BI dashboards are another -- to make its version of Hadoop as easy to use for mainstream corporate workers as SQL-based Business Intelligence tools.

"MicroStrategy, Business Objects, Oracle , IBM DB2 Parallel Edition -- these products are all powerful and wonderfully easy to use for the business analyst," Olson said. By contrast, Hadoop remains something that tends to intimidate all but "hardcore Java hackers."

"Hadoop needs to be made easier. It's powerful, but requires a fair bit of programming," he said.

Cloudera counts 30 customers today, most of them in government, financial services and retail, said Olson. They include LinkedIn, eHarmony, JP Morgan Chase, and many of the other companies that presented at the inaugural HadoopWorld conference last fall.

Cloudera, which has raised $11 million via two rounds of venture funding , plans to double its 27-employee headcount this year to help turn on mainstream enterprises to NoSQL alternatives such as Hadoop and its progenitor, MapReduce.

"Our goal in 2010 is to demonstrate to enterprises who haven't seen Hadoop before how you can get more value out of data already collected in your relational databases -- which you would leave in place -- by combining it with new data types," he said.

While Olson grants that SQL is an easier and more powerful environment for many users today, he says Hadoop will soon catch up because they "are innovating much faster."

"Why don't we see how long it takes for Oracle to make another major release?" he said.

Hadoop is better at crunching disparate data types than relational-based data marts or data warehouses, which force you to create a schema for the data upfront.

So also, argues Olson, is Hadoop's scalability, saying there are a number of Hadoop clusters storing data "well-known to be multiple petabytes in size." He declined to name who those companies are and whether they are Cloudera customers.

Despite the potential of the Hadoop technology to serve as a scalable, universal data store, Olson sees it complementing, not competing with, relational databases.

"It kinda sucked to compete with Larry Ellison," said Olson, referring to his former firm, SleepyCat Software, embedded database maker BerkeleyDB, which was acquired by Oracle in 2006. "I finally managed to sell the guy a company. So I don't want to [compete with] him again."

Cloudera also works closely with Vertica Systems Inc. to enable users to connect data stored in Vertica's SQL-based data warehouse with Cloudera, and vice-versa.

Olson differentiated Cloudera's offering from relational data warehouse vendors such as Greenplum Inc. and Aster Data Systems who have introduced MapReduce/Hadoop features.

"What Aster Data and Greenplum have is not MapReduce in my's tied only to relational data, not general data," he said. "The reason you would choose Greenplum [MapReduce] is because you'd already be a Greenplum customer, not because you wanted MapReduce."

Eric Lai covers Windows and Linux, desktop applications, databases and business intelligence for Computerworld . Follow Eric on Twitter at @ericylai , send e-mail to or subscribe to Eric's RSS feed .

Copyright © 2010 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon