Hadoop challenger works to add developers

LexisNexis says benchmark testing fund its HPCC Systems is much faster on less hardware than Hadoop

LexisNexis has worked for more than a decade to develop a large scale system for Big Data manipulation, and it believes that it has produced something that's better and more mature than the better known Hadoop technology.

The company just needs developers to agree.

LexisNexis developed the parallel processing data platform to handle the demands of its own data intensive research business. It wants it extend use of the technology, dubbed HPCC Systems, to broader markets, but is clearly aware that open source Hadoop has already established itself as a strong presence.

The company has opened sourced the HPCC platform, and says it is challenging Hadoop in benchmarks.

The company says there are now about 1,000 HPCC Systems developers worldwide, most of who have been trained since the platform was opened sourced in June,

By contrast, a Hadoop developer conference last summer drew a crowd of some 1,700.

To help demonstrate its capabilities, a Terasort benchmark was run to compare HPCC against a similar benchmark and workload by SGI on a Hadoop cluster , announced in October.

LexisNexis says its benchmark was 25% faster, and ran on far less hardware: A 4-node cluster versus a 20-node cluster on the SGI system. The LexisNexis test was done on a Dell PowerEdge, two socket servers, with six core Intel Xeon processors.

Flavio Villanustre, vice president of infrastructure and products at LexisNexis Risk Solutions, credited the test results, in part, on the number of lines in code needed for the sorting versus Hadoop.

LexisNexis developed its own language, ECL, for this system

It took three lines of ECL code to do the sorting, compared to 100 plus lines in Java, which is what is used in Hadoop, said Villanustre.

Asked to respond to the HPCC benchmark, an Bill Mannel, vice president of product marketing at SGI said in a statement that "there are many variations of distributed processing which can run Terasort. HPCC Systems is running Terasort on ECL code, which is different than SGI running on a MapReduce-based Hadoop. SGI remains committed to pushing the bar on performance and beating and improving our own record." MapReduce is a software framework.

Villanustre believes HPCC could do well in the marketplace against Hadoop, but he doesn't take anything for granted. He said that he wants to avoid ending up like Betamax, which lost the video format wars to VHS, or IBM's OS/2 operating system, which was cruushed by Microsoft Windows.

"We want to ensure adoption and that's why we are pushing so much," said Villanustre.

The company has also made its HPCC system available in the cloud via Amazon Web Services.

The platform is available through a dual licensing strategy that allows a community edition and a commercial enterprise platform.

Matt Aslett, an analyst at The 451 Group, believes LexisNexis can be a lot more aggressive "given the large and growing ecosystem of developers and vendors that has formed around Apache Hadoop."

Specifically, Aslett believes the dual licensing strategy enables the company to protect the code from forking and generate revenue from adopters, "but dual licensing strategies have traditionally not been very successful at generating a developer community."

Aslett said that "releasing the software under a more permissive license or contributing it to an established open source foundation would have been more likely to drive developer adoption."

Bruce Perens, a leading open source advocate and a strategic consultant to LexisNexis, developed the licensing approach, called The Covenant , for the HPCC Services platform. He agrees that dual-licensing strategies have had a mixed history, but says the HPCC licensing approach is designed to address that problem.

Perens said the present version of the code will always remain open and there's no way to withdraw an open source license. "One assigns code to HPCC only if one wishes HPCC to maintain it from then on - which, of course, is very desirable," he said.

Every time a developer adds code and then assigns the copyright to the company, there's a three-year guarantee to each contributor that the HPCC code will remain open source, under the Covenant.

The three-year provision "is a guarantee to help developers be confident about the destiny of their contribution, not a way of holding the project at ransom," said Perens, in an email response to questions.

"HPCC always has the option to go to a less restrictive license if dual-licensing doesn't work for them, but this is not expected," said Perens. Everybody loves to get a gift, "but it's not always fair to the party that writes the code" to give it as no-strings-attached gift to competitors.

Perens argues that dual-licensing puts some economic sense in Open Source, and "the covenant repairs the community side of dual licensing," he said.

Patrick Thibodeau covers SaaS and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov , or subscribe to Patrick's RSS feed . His e-mail address is pthibodeau@computerworld.com .

Read more about bi and analytics in Computerworld's BI and Analytics Topic Center.

Copyright © 2011 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon