Hadoop gets real

Robust data processing and storage capabilities make Hadoop both wildly popular and wildly complex. Here's how four IT leaders managed to bring Hadoop systems from the sandbox into production.

Technology professionals with strong skills in Apache Hadoop are among the hardest to find. In fact, demand for people with Hadoop expertise has skyrocketed 34% since last year, according to Wanted Analytics, a research firm specializing in the labor market.

But while competition for talent is fierce, the days of highly paid data science rock stars might be coming to an end. Hadoop is known for its robust data processing and storage power -- as well as its complexity. But businesses that need such functionality may no longer have to hunt far and wide for IT pros with Hadoop skills, because vendors are building Hadoop systems that are easier to use.

Pivotal Software, Syncsort, MapR Technologies and Zettaset are just a few of the vendors creating business-friendly applications for crunching large data sets on Hadoop. The result is a burgeoning ecosystem of products that promise to help IT departments reduce their dependence on high-priced talent, increase security, cut costs and better align big data activities with business goals.

In fact, as these systems proliferate and mature, many IT professionals are wondering whether Hadoop can overcome its inherent security weaknesses and emerge as a full-fledged operating system, not unlike Microsoft Windows with its orbit of business applications.

One satisfied user is Michael Brown, CTO at Reston, Va.-based ComScore. A Web intelligence firm that monitors the online shopping behavior of more than 2 million people and collects information that helps advertisers create targeted marketing campaigns, ComScore ingests a whopping 60 billion new pieces of data every day.

To ensure that its data is readily accessible, ComScore started using MapR's Hadoop distribution in the fall of 2011. But even with MapR's system, ComScore's data scientists needed to painstakingly hand-code applications to prep data before offloading it into Hadoop. That changed when ComScore began using Syncsort DMX-h last year. Certified by MapR in June, Syncsort's Hadoop extract, transform and load (ETL) software lets ComScore offload and modify mission-critical data from legacy systems into Hadoop without hand-coding. As a result, the company can process data faster and bring new apps to market more rapidly while cutting its hardware investments.

In one recent proof of concept, ComScore compared developing 75 lines of code in the Apache Pig platform to writing it in Syncsort DMX-h. The task took 25 hours using the Apache system, but just 12 hours with Syncsort's offering.

MapR's new application gallery further simplifies Hadoop. Launched in June, the gallery includes a range of ready-made apps for all kinds of Hadoop activities, from provisioning and security to business intelligence and machine learning.

Until now, organizations have relied on highly skilled in-house programmers and engineers to tackle the sophisticated work involved in building big data apps. Options such as MapR's gallery, however, promise to make the app development process easier. Indeed, Scaling Data, founded by former executives of Hadoop distributor Cloudera, recently raised $4.4 million in venture capital to develop a line of easy-to-use applications that will run on Hadoop.

"We're now starting to see applications being sold that can be added to the Hadoop system, which is a big change," says Brown, who is considering using MapR's app gallery.

1 2 3 4 Page 1
Page 1 of 4
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon