Skip the navigation
)
News

EMC joins forces with Hadoop distributor MapR Technologies

It will use MapR's technology as part of a pre-tested Hadoop software stack

May 25, 2011 01:26 PM ET

Computerworld - EMC today formally announced a reseller partnership with MapR Technologies, a start-up that plans to sell a proprietary MapReduce product based on Apache Hadoop.

To date, MapR has been in development mode. The company has 15 beta customers testing its product, which wil be sold as both software and as a stand-alone appliance.

"With the EMC deal, we get worldwide distribution," said John Schroeder, CEO of MapR. "[And]...we get a worldwide support organization."

MapR will be part of the recently announced EMC Greenplum HD Enterprise Edition, an interface-compatible implementation of the Apache Hadoop software stack.

Earlier this month, EMC announced its planned partnership with MapR as part of a new direction into offering big data database and MapReduce products.

MapReduce is a framework for processing enormous data sets and performing high-performance analytics in a distributed database that run across a cluster of server nodes. In every cluster, a master node performs the mapping function. As data is input, it is partitioned into smaller sub-groups for processing of a larger query. Because the query is broken into subsets, MapReduce is faster than traditional relational databases at processing "big data" sets.

"This is a major advancement for Hadoop users everywhere. MapR's innovations coupled with EMC's big data analytics capabilities and service will allow more people to use the power of big data analytics and enable substantial market growth," John Webster, a senior partner at market research firm the Evaluator Group, said in a statement. "MapR has managed to innovate on performance, cost reduction, dependability and ease-of-use all at once. This marks a major shift for the Hadoop market."

Luke Lonergan, CTO of EMC's Data Computing Division and a co-founder of Greenplum, the maker of a massively parallel data warehouse that EMC bought last year, said that EMC is working with dozens of resellers to get the MapR Hadoop software to customers.

"Combined with the EMC Greenplum Database, we will allow the co-processing of both structured and unstructured data within a single, seamless solution," said Scott Yara, co-founder of Greenplum and vice president of products for EMC's Data Computing Division.

MapR built a proprietary replacement for the Hadoop Distributed File System (HDFS) that can substitute existing installations of the Hadoop file system. What MapR's product adds is accelerated performance and resilience, according to Schroeder.

"HDFS is really like writing to CD ROM. You can write a file to it, but you can't access it through multiple readers. It's very constrained," he said.

MapR's product offers multiple channels to data via the Network File System protocol, which is widely used in network-attached storage today. The company also re-architected the distributed NameNode, the centerpiece of an HDFS file system. The NameNode is a hierarchical naming system on a distributed database in the same vein as a single domain name space. The rearchitected NameNode offers greater high availability, Schroeder said.

MapR said it also eliminated all single points of failure in the Hadoop stack and created an automated failover feature called Job Tracker, which shares application jobs between multiple nodes so that if a primary node fails, it automatically picks up the task on the next available node.

MapR also added data mirroring for business continuity, wide area replication support and data snapshot capability to its software for greater resiliency.

"The only data protection within Hadoop is replication," Schroeder said. "Typically people make three copies fo data. That doesn't help you if you have a user or application error."

The snapshot capability allows administrators to roll an application back to a time prior to an error. For example, if an application or user error occurred at 9 a.m., the administrator can roll the application image back to 8:59 a.m.

"It's the same thing you have in any serious storage platform from companies like EMC, HP or NetApp," he said.

Because MapR's file system is more efficient than HDFS, users will achieve two to five times the performance over standard Hadoop nodes in a cluster, according to Schroeder. That translates into being able to use about half the number of nodes typically required in a cluster, he said.

"Hadoop nodes cost about $4,000 per node depending on configuration. If you add in power costs, HVAC, switching, and rackspace, you'll probably double that," Schroeder said. "Our product can immediately save you $4,000 and over 8 years it'll save you $8000 per node."

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at Twitter @lucasmearian or subscribe to Lucas's RSS feed Mearian RSS. His e-mail address is lmearian@computerworld.com.

Read more about Databases in Computerworld's Databases Topic Center.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Databases White Papers
Measuring the Business Value of CI in the Data Center
One of the key strategies that IT teams are pursuing to reduce capital costs while boosting asset utilization and employee productivity is the...
The Different Types of UPS Systems
There is much confusion in the marketplace about the different types of UPS systems and their characteristics. Each of these UPS types is...
SAS High Performance Analytics
This paper explains how you can shrink decision times from days to seconds to quickly respond to changing business conditions.
Drive Your Business with Predictive Analytics
Predictive analytics has the power to significantly improve the bottom line. From better targeting and risk assessment to streamlining operations and optimizing business...
The Analytical SMB: More Data, More Users, Less Time
This Aberdeen Research Brief examines the key trends in business analytics and the tangible business impact effective analytics can have for SMBs.
All Databases White Papers
Databases Webcasts
Oracle Database Appliance Best Practices
Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited...
Accelerate Document Processing and Wow Your Customers
Learn how intelligent imaging and BPM solutions, coupled with pragmatic best practices and methodology, can improve productivity, lower cost, increase accuracy, reduce cycle...
Distributed Database Security with Real-time Monitoring
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
InfoSphere Warehouse Packs Demo
These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
Delivery Management -- Extending Lifecycle Management
Date: Wednesday, June 20, 2012, 1:00 PM EDT

Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
All Databases Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs