Skip the navigation

Hadoop challenger works to add developers

LexisNexis says benchmark testing found its HPCC System is much faster on less hardware than Hadoop

December 22, 2011 06:19 AM ET

Computerworld - LexisNexis has worked for more than a decade to develop a large scale system for Big Data manipulation, and it believes that it has produced something that's better and more mature than the better known Hadoop technology.

The company just needs developers to agree.

LexisNexis developed the parallel processing data platform to handle the demands of its own data intensive research business. It wants it extend use of the technology, dubbed HPCC Systems, to broader markets, but is clearly aware that open source Hadoop has already established itself as a strong presence.

The company has opened sourced the HPCC platform, and says it is challenging Hadoop in benchmarks.

The company says there are now about 1,000 HPCC Systems developers worldwide, most of who have been trained since the platform was opened sourced in June,

By contrast, a Hadoop developer conference last summer drew a crowd of some 1,700.

To help demonstrate its capabilities, a Terasort benchmark was run to compare HPCC against a similar benchmark and workload by SGI on a Hadoop cluster, announced in October.

LexisNexis says its benchmark was 25% faster, and ran on far less hardware: A 4-node cluster versus a 20-node cluster on the SGI system. The LexisNexis test was done on a Dell PowerEdge, two socket servers, with six core Intel Xeon processors.

Flavio Villanustre, vice president of infrastructure and products at LexisNexis Risk Solutions, credited the test results, in part, on the number of lines in code needed for the sorting versus Hadoop.

LexisNexis developed its own language, ECL, for this system

It took three lines of ECL code to do the sorting, compared to 100 plus lines in Java, which is what is used in Hadoop, said Villanustre.

Asked to respond to the HPCC benchmark, an Bill Mannel, vice president of product marketing at SGI said in a statement that "there are many variations of distributed processing which can run Terasort. HPCC Systems is running Terasort on ECL code, which is different than SGI running on a MapReduce-based Hadoop. SGI remains committed to pushing the bar on performance and beating and improving our own record." MapReduce is a software framework.

Villanustre believes HPCC could do well in the marketplace against Hadoop, but he doesn't take anything for granted. He said that he wants to avoid ending up like Betamax, which lost the video format wars to VHS, or IBM's OS/2 operating system, which was crushed by Microsoft Windows.

"We want to ensure adoption and that's why we are pushing so much," said Villanustre.

The company has also made its HPCC system available in the cloud via Amazon Web Services.

Our Commenting Policies
Internet of Things: Get the latest!
Internet of Things

Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!