"CIOs and CSOs are putting their foot down on security," says Zettaset CEO Jim Vogt. "Hadoop has to get hardened to the same extent that existing data center technologies are today."
And in another move that should lead to improved Hadoop security, Hortonworks recently acquired XA Secure, a provider of security and policy management tools for Hadoop. Hortonworks plans to incorporate the XA Secure technology into its Hortonworks Data Platform while also submitting it to the open-source Apache community. The appeal of XA Secure software stems from its ability to provide centralized capabilities around data security for easier governance.
New and Improved Proofs of Concept
Despite the steps taken by vendors such as Hortonworks to fortify their distributions, users still have concerns about Hadoop security.
Security is "still embryonic in Hadoop," says John Williams, senior vice president of platform operations at TrueCar, a Santa Monica, Calif.-based automotive data company that provides a negotiation-free car-buying service.
TrueCar processes and analyzes reams of data to help people decide what vehicle to buy. The company handles nearly 700 gigabytes of "primarily unstructured data like new vehicle models, used car inventory and auction data," as well as 100 million vehicle images every day, according to Williams. "Our data ecosystem is gigantic," he says.
TrueCar turned to Hortonworks' Hadoop distribution to better interpret and parse the staggering amount of data it crunches. Key benefits include a faster development process for new products and reduced overhead on costly infrastructure. But more important, Williams says, TrueCar no longer needs to shoehorn data into a limited-size SQL infrastructure. Rather, he says that "with Hadoop, it becomes economical to never delete any data ever," resulting in "this unbelievably rich historical log of all the data from your business."
But because of concerns about security, "we realized we couldn't do a proof of concept [with Hadoop] that's going to involve customer data like names, addresses and phone numbers," Williams says. "It's not quite ready for that."
Hadoop security "is a big concern" for ComScore as well, says Brown, noting that the company has taken considerable steps to better secure its data. "We have one set of networks for the capture of data and one set of networks for the processing of data," he explains. Additional precautions include relying on a standard Active Directory security infrastructure and ensuring field-level encryption on sensitive data.
So how can organizations determine whether the benefits of business-friendly Hadoop systems outweigh the platform's security risks? For many, the answer is an elaborate proof of concept to demonstrate how Hadoop and its related tools can be integrated into an enterprise's existing architecture.
For example, TrueCar could have taken a traditional approach to testing Hadoop by simply running a handful of queries on Apache Hive data warehouse software. Instead, it opted for an "outside the box" experiment that involved performing "a massively parallel ingest on thousands of gnarly formatted data files" on Hadoop, Williams says. "When that proof of concept succeeded, it just blew everyone's mind."