MapR's New Hadoop Distribution Promises No-Risk Updgrade
CIO - The company has taken a different tack with its distribution than competitors Cloudera and Hortonworks-unlike it competitors, MapR has committed to backward compatibility, enabling organizations to run the Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster simultaneously.
By ensuring that MapReduce 1.x and YARN schedulers can coexist, MapR gives MapReduce 1.x users an easy and risk-free path to upgrade to the new scheduler, says Jack Norris, CMO of MapR Technologies.
"If you want to open up the processing to some other type of application, you don't want to have to rewrite that application just to take advantage of Hadoop." -- Jack Norris, CMO of MapR Technologies
"Our focus is really production use of Hadoop," Norris says. "Once you go into production, it's about availability and uptime and integration with existing apps. We're backward compatible from previous distributions to this distribution because you can't introduce changes easily into a production environment. Customers say, 'YARN's exciting, but I want to put my toe in the water. I've got existing jobs that are running.' We've got customers running over 20,000 jobs a day on our platform."
Apache Hadoop YARN (short for Yet Another Resource Negotiator) is the foundation of Hadoop 2.0, released last October. YARN serves as the Hadoop operating system, taking what was a single-use data platform for batch processing and turning it into a multi-use platform that enables batch, interactive, online and stream processing.
YARN acts as the primary resource manager and mediator of access to data stored in Hadoop distributed file system (HDFS), giving organizations the ability to store data in a single place and then interact with it in multiple ways, simultaneously, with consistent levels of service.
By combining YARN with MapR's read-write (R/W) POSIX data platform, Norris says MapR enables YARN-based applications to not only run on a Hadoop cluster and share compute resources, but also read, write and update data in the underlying distributed file system and database tables. As a result, it gives organizations the ability to develop and deploy a broader set of big data applications.
[Related: Hortonworks Brings Hadoop 2.0 to Windows]
"YARN opens up Hadoop for processing patterns beyond just MapReduce," says Evan Quinn, research director, Enterprise Management Associates. "MapR's Hadoop distribution extends YARN even further by adding a full, open standard NFS interface in addition to HDFS, enabling non-MapReduce applications to optimally take advantage of a cluster's storage."
"When we talk about a general-purpose storage platform, it's about random read-write," Norris says. "If you want to open up the processing to some other type of application, you don't want to have to rewrite that application just to take advantage of Hadoop. You just want it to run on the platform. Having to rewrite it to use the Hadoop distributed file system (HDFS) API introduces change that can require a lot of forethought and planning-and in some cases a redesign of the application. We allow you to run directly on the MapR platform with no changes, only now you're taking advantage of the highly distributed framework that MapR provides."
Whereas Cloudera's and Hortonworks' Hadoop distributions are entirely open source, MapR Technologies has stripped out the HDFS layer of Hadoop and replaced it with its own custom and proprietary data layer that supports the HDFS APIs to address what it considers limitations in the Hadoop architecture like the capability to perform snapshots and disaster recovery capabilities.
"It's compatible with all the standard enterprise applications and tools. Any package designed for Hadoop runs on MapR with no changes or recompiling," Norris says. "We didn't just look at the community roadmap and try to jump ahead and do features six months before the community tried to do them. We looked at the limitations we saw and architected for those. It's really hard for re-architecture to happen in an open source community."
MapR Sandbox Provides Free VM Installation and Tutorials
MapR also announced the availability of MapR Sandbox for Hadoop, a virtualized environment containing MapR's distribution intended to help users get exploring and experimenting with Hadoop in less than five minutes. The Sandbox is a complete and fully configured virtual machine installation of the MapR distribution with point-and-click tutorials for developers, analysts and administrators.
"Organizations face a shortage of Hadoop developers and data scientists, and without useful and easily accessible training tools, productive Hadoop developers will continue to be in short supply," says Tomer Shiran, vice president of product management at MapR Technologies. "With the MapR Sandbox, developers have all the tools they need in a convenient and free package to get up to speed on Hadoop quickly."
MapR Integrates with HP Vertica Analytics Platform
Finally, the company announced the early access release of the new HP Vertica Analytics Platform on MapR, providing a high-performance, interactive SQL-on-Hadoop solution that tightly integrates HP Vertica's analytic platform with MapR's Hadoop distribution. The company notes that it provides 100 percent ANSI SQL-compliance with advanced interactive analytic capabilities as well as business intelligence (BI) and ETL tool support.
"Organizations embracing Hadoop have been struggling to empower large groups of business analysts who require sophisticated SQL and BI tools to do their jobs, but feel handcuffed when using incomplete, SQL-like approaches," says John Schroeder, CEO and cofounder of MapR Technologies. "Providing HP Vertica's very high-performance and rich SQL and built-in analytic functions on MapR's best-of-breed platform for Hadoop sets business analysts free to do faster, interactive analytics from data harnessed by Hadoop."
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.
Read more about big data in CIO's Big Data Drilldown.
- Hadoop for Dummies Today, organizations in every industry are being showered with imposing quantities of new information. Along with traditional sources, many more data channels and...
- The Top Five Ways to Get Started with Big Data Despite the increased focus on big data over the past few years, most organizations are still talking about what big data is rather...
- Data Warehouse Augmentation: The Queryable Data Store While organizations have, to date, been busy exploring and experimenting, they are now beginning to focus on using big data technologies to solve...
- The IBM Big Data Platform IBM is unique in having developed an enterprise class big data platform that allows you to address the full spectrum of big data...
- Live Webcast Best Practices: How to Improve Business Continuity with Virtualization VMware solutions include a range of business continuity capabilities to help ensure availability for applications across your virtualized environment. Learn More>>
- Endpoint Data Management: Protecting the Perimeter of the Internet of Things Not surprisingly, "Internet of Things" (IoT) and Big Data present new challenges AND opportunities for enterprise IT. Teams need to harness, secure and...
- Best Practices: How to Improve Business Continuity with Virtualization VMware solutions include a range of business continuity capabilities to help ensure availability for applications across your virtualized environment. Learn More>> All Data Center White Papers | Webcasts