Skip the navigation
)
News

EMC adds unstructured big-data analytics to Greenplum platform

Offers a 1,000-plus-node test bed for Hadoop developers

September 21, 2011 11:20 AM ET

Computerworld - EMC's Greenplum subsidiary today announced a new capability in its Apache Hadoop Data Computing Appliance (DCA) that allows users to mix and match unstructured and structured data analytics platforms.

EMC also announced its Greenplum Analytics Workbench, a 1,000-plus-node test bed for software integration tests of Apache Hadoop software.

The test bed provides the Hadoop open-source community with the testing resources to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. All testing and results will be given back to the Apache Software Foundation and the open-source community. EMC's testing will be planned in coordination with the Apache Hadoop project. Hadoop is an open-source software platform, originally developed by Google, for analyzing large quantities of data.

On its Greenplum appliance product front, EMC introduced the Modular Data Computing Appliance, which allows users to combine a massively parallel processing relational database with enterprise-class Apache Hadoop in a single, unified appliance to achieve structured and unstructured data processing.

Greenplum introduced the DCA in October 2010. An updated version of the DCA that included a Hadoop appliance was released this past May.

The Greenplum HD (Hadoop) DCA is built on top of Intel X86 servers and uses both a structured database built by Greenplum, which EMC acquired last year, and the Apache open-source version of Hadoop. The older version of the appliance is based on Sun Fire x64-based servers.

According to Scott Yara, co-founder of Greenplum and vice president of products for EMC's Data Computing Division, administrators can read and write files in parallel from Greenplum to HDFS (Hadoop File System), enabling rapid data sharing. Cross-platform analysis can be performed using Greenplum SQL and advanced analytic functions accessing data on HDFS.

The new Modular DCA adds high-performance computing modules in the form of SAS Institute's In-Memory Analytics software, allowing it to serve up both structured data, such as databases, and unstructured file data, according to Yara.

"The main change is that it can perform parallel processing using server memory through the use of business analytics software [from SAS]," Yara said. "We wanted to offer a Lego-building-block-type architecture."

Through the use of the SAS software, structured and unstructured data can exist on multiple x86 hosts, the purpose of which is to allow users to perform computations in memory on each server node in a clustered configuration.

"The power of the appliance is that it can perform all these complex problems in parallel," Yara said.

The new Modular DCA is undergoing product trials and is expected to be available by the end of this year, Yara said.

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and healthcare IT for Computerworld. Follow Lucas on Twitter at Twitter @lucasmearian or subscribe to Lucas's RSS feed Mearian RSS. His e-mail address is lmearian@computerworld.com.

Read more about Storage in Computerworld's Storage Topic Center.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Storage White Papers
The Total Economic Impact of the HP 3PAR Storage
Forrester Consulting provides an analysis of four HP 3PAR storage customer implementations to quantify the efficiency and cost savings achieved over legacy storage...
Using HP's Converged Storage to Develop/Enhance Business Resiliency in VMware Environments
In this report, Enterprise Strategy Group reviews how HP's portfolio of hardware, software, and services can provide the foundational support for VMware environments....
Converged Storage: Utility Storage - The Ideal Platform for Virtual and Cloud Computing
Server virtualization has transformed corporate IT -- companies have enjoyed major cost savings and have gained flexibility and efficiency. But this has also...
Defining Tier One Storage in the Modern Data Center
This report defines "tier-1" storage in the modern IT world and in the data centers and services that support it. What was a...
The Best Way to Build a Cloud -- HP CloudSystem Matrix and HP 3PAR Utility Storage provide solid, flexible foundation
Learn how HP CloudSystem Matrix and HP 3PAR Utility Storage provide a solid, flexible foundation for your cloud environment.

Intel and the Intel logo...
All Storage White Papers
Storage Webcasts
Live Webcast
Today's NAS: A Solution Beyond Old Limits
Date: Tuesday, July 17, 2012 2:00 PM EDT

Traditional NAS systems don't scale beyond fixed limits. Proliferation of NAS systems leads to management...
Today's NAS: A Solution Beyond Old Limits
Date: Tuesday, July 17, 2012 2:00 PM EDT

Traditional NAS systems don't scale beyond fixed limits. Proliferation of NAS systems leads to management...
Distributed Database Security with Real-time Monitoring
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
InfoSphere Warehouse Packs Demo
These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
Delivery Management -- Extending Lifecycle Management
Date: Wednesday, June 20, 2012, 1:00 PM EDT

Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
Leverage automation today to reduce IT complexity
Date: Tuesday, June 5, 2012, 2:00 PM EDT

Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
All Storage Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs