Hadoop skills are in high demand

Enterprise adoption of technology has outstripped number of skilled Hadoop pros

NEW YORK -- The growing enterprise interest in Hadoop and related technologies is driving demand for professionals with big data skills.

Analysts and IT managers at the Hadoop World conference here this week repeatedly pointed to skills availability as one of the key challenges companies face in adopting Hadoop and said that those with the right skills could command healthy premiums.

One indication of just how limited that skills supply is: IT executives from JP Morgan Chase and EBay who delivered keynote addresses at the conference used the opportunity to recruit from the audience.

Hugh Williams, vice president of experience, search and platforms at EBay, told audience members that the auction site is recruiting Hadoop professionals and he invited those interested in exploring opportunities to speak with him.

Larry Feinsmith, managing director at JP Morgan Chase, who followed Williams, only half-jokingly told the audience that Chase was also hiring and would be willing to pay 10% more than EBay.

"Hadoop is the new data warehouse. It is the new source of data" within the enterprise, said James Kobielus, an analyst with Forrester Research. "There is a premium on people who know enough about the guts of Hadoop" to help companies take advantage of it, he said.

Hadoop allows companies to store and manage far larger volumes of structured and unstructured data than can be managed affordably by today's relational database management systems.

A growing number of companies have begun tapping the technology to store and analyze petabytes of data such as weblogs, click stream data and social media content to gain better insights about their customers and their business.

The increasing enterprise adoption is driving demand for people with advanced analytics skills, Kobielus said. That includes people with backgrounds in areas such as multivariate statistical analysis, data mining, predictive modeling, natural language processing, content analysis, text analysis and social network analysis, he said.

"Big data in the broader sense -- and Hadoop in particular -- is driving demand for people who have experience doing advanced analytics using newer approaches such as MapReduce and R for predictive and statistical modeling," he said. These are the data analysts or data scientists who will work with structured and unstructured data in Hadoop environments to deliver new insights and intelligence to the business, he said.

Interest in Hadoop is also creating demand for Hadoop platform management professionals, Kobielus said. Their job will be to implement Hadoop clusters, secure, manage and optimize them and to ensure that the cluster remains available for enterprise use. "These are the people who build out and optimize the platform" on which Hadoop applications run, he said.

"The database administrators who administer Teradata and [Oracle's] Exadata are the same people who are now beginning to redefine their roles as Hadoop cluster administrators," he said. "They realize this is a brand new world." Also, expect to see demand for storage management professions and for those who can help integrate Hadoop environments with existing relational database technologies.

Demand for Hadoop professionals falls into three broad categories: data analysts or data scientists; data engineers ;and IT data management professionals, said Martin Hall, CEO of Karmasphere, which sells software products for Hadoop environments.

The data management professionals will be the ones who choose, install, manage, provision and scale Hadoop clusters, Hall said. These are the IT professionals who decide whether Hadoop is located in the cloud or on premise, which vendors to choose, which distribution of Hadoop to use, the size of the cluster and whether it will be used for running production applications or for quality-testing purposes.

The skills required for this role are similar to those required for doing the same tasks in traditional relational database and data warehouse environments, he said.

Hadoop data engineers, meanwhile, are those responsible for creating the data processing jobs and building the distributed MapReduce algorithms for use by data analysts. Those with skills in areas such as Java and C++ could find more opportunities as enterprises begin deploying Hadoop, he said.

The third category of professional in demand are data scientists with experience in areas such as SAS, SPSS and programming languages such as R, Hall said. These are the professionals who will generate, analyze, share and integrate intelligence gathered and stored in Hadoop environments.

For the moment, the shortage of Hadoop manpower means companies need help from service providers to deploy the technology. One indication of this is the fact that the revenues generated by professional consulting and systems integration firms involving Hadoop is significantly larger than the revenues from sale of Hadoop products, Kobielus said.

Companies such as Cloudera, MapR, Hortonworks and IBM today offer training courses in Hadoop that companies can take advantage of to build their own Hadoop centers of excellence, he said.

See more Hadoop coverage.

Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld. Follow Jaikumar on Twitter at @jaivijayan or subscribe to Jaikumar's RSS feed . His e-mail address is jvijayan@computerworld.com.

Copyright © 2011 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon