One of the challenges companies often face when using Hadoop to aggregate massive volumes of structured and unstructured data is finding a way to efficiently control and manage user access to that data.
Zettaset, a Mountain View, Calif.-based vendor of tools for managing big data, on Wednesday announced a new security initiative to help companies address that issue.
Under its SHadoop initiative, Zettaset will integrate new functions into its existing Hadoop Orchestrator platform that will allow IT administrators to implement role-based access control over Hadoop environments.
The new tools will allow administrators to better define what different categories of users can and cannot do with data in a Hadoop platform -- giving administrators a way to restrict users from executing certain jobs, or from importing or exporting certain kinds of data, according to Zettaset.
SHadoop will allow administrators to establish a baseline security policy for all users with access to a Hadoop system, the company said. It will then allow them to track, log and audit all user or group activity within the Hadoop platform.
Future versions of SHadoop will enable companies to encrypt data stored in a Hadoop cluster or transmitted between Hadoop nodes, Zettaset said.
Those features all address enterprise concerns around using Hadoop, according to analysts.
The Apache Hadoop Distributed File System allows companies to store and manage petabytes of data from disparate data sources far more efficiently than relational database management systems allow. A growing number of companies have begun using the open-source technology to aggregate and analyze huge volumes of structured and unstructured data captured from websites, social media networks, emails, audio and video files, sensors and machines.
While this data aggregation has enabled new levels of social media mining, sentiment analysis and fraud detection, it also creates new access control problems. Unlike traditional database management technologies, Hadoop does not give administrators many ways to control access to data beyond Access Control Lists and Kerberos-based authentication.
"So, while you can authenticate users, how do you go about setting up fine-grained access provisions," said David Menninger, an analyst with Ventana Research. "You can segregate sensitive data into separate files, nodes or clusters, but that still doesn't give you the row-level access control that people are used to in relational databases."
As a result, security and access control continue to be the areas where enterprise satisfaction with Hadoop lags behind traditional databases, he said. In a study by Ventana Research last year, fewer than half of the respondents said they were satisfied with Hadoop encryption and security capabilities, compared to nearly 70% who said they were satisfied with the security capabilities of their non-Hadoop database technologies.
"Our Hadoop research shows the biggest gap in satisfaction between Hadoop users and other big data technologies is security," he said.
Nathaniel Rowe, an analyst with the Aberdeen Group said that the security capabilities available with many current Hadoop distributions do not always match enterprise needs.
"Hadoop is a rapidly developing platform," Rowe said. "The way it has taken the big data world by storm is absolutely incredible." But on the security front, it is not always "quite up to snuff," he said.
Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld. Follow Jaikumar on Twitter at @jaivijayan or subscribe to Jaikumar's RSS feed . His e-mail address is firstname.lastname@example.org.