Hadoop is ready for the enterprise, IT execs say
Big companies are using Hadoop systems in big projects, despite concerns about issues such as security.
Computerworld - Despite some lingering user concerns about security and other issues, Hadoop is ready for enterprise use, according to IT executives at the Hadoop World conference in New York earlier this month.
Larry Feinsmith, managing director of IT at JPMorgan Chase, told a keynote audience that the financial services firm has been increasingly using the open-source storage and data analysis framework for almost three years.
JPMorgan Chase still relies heavily on relational database systems for transaction processing, but it uses Hadoop technology for a growing number of purposes, including fraud detection, IT risk management and self service, Feinsmith said.
With over 150 petabytes of data stored online, 30,000 databases and 3.5 billion log-ins to user accounts, data is the lifeblood of JPMorgan Chase, Feinsmith said.
Hadoop's ability to store vast volumes of unstructured data allows the company to collect and store Web logs, transaction data and social media data. "Hadoop allows us to store data that we never stored before," he said.
The data is aggregated into a common platform for use in a range of customer-focused data mining and data analytics tools, Feinsmith said.
Meanwhile, eBay is using Hadoop technology and the Hbase database, which supports real-time analysis of Hadoop data, to build a new search engine for its auction site.
Hugh Williams, vice president of experience, search and platforms at eBay, said the new engine, code-named Cassini, will replace technology the company has used since the early 2000s. The update is needed in part to handle surging volumes of data.
He noted that eBay has more than 97 million active buyers and sellers and over 200 million items for sale in 50,000 categories. The site handles close to 2 billion page views, 250 million search queries and tens of billions of database calls daily, he added.
The company has 9 petabytes of data stored on Hadoop and Teradata clusters, and the amount is growing quickly, he said.
Williams said about 100 eBay engineers are working on the Cassini project, making it one of the company's largest development efforts.
The new engine, slated to go live next year, is expected to respond to user queries with results that are context-based and more accurate than those provided by the current system, he said.
Feinsmith warned that IT shops interested in Hadoop should be aware of potential security issues. And he explained that aggregating and storing data from multiple sources can create a slew of problems related to access control and data management, while raising questions about data entitlement and data ownership.
Feinsmith also listed other potential Hadoop drawbacks that users should be aware of before embarking on big projects.
For instance, he said the Hadoop marketplace is "very confusing," featuring an oft-changing slate of vendors, products and standards. In addition, skilled Hadoop engineers are scarce.
And Williams noted that related technologies, such as Hbase, are still somewhat immature, which raises questions about system stability.
But Hadoop has plenty of potential. Feinsmith said that IT workers at JPMorgan Chase are debating whether relational database technologies will evolve to meet the bank's emerging big data needs, or if Hadoop-based systems will become adept at transaction processing.
This version of this story was originally published in Computerworld's print edition. It was adapted from an article that appeared earlier on Computerworld.com.
Read more about Data Center in Computerworld's Data Center Topic Center.
- Hadoop for Dummies Today, organizations in every industry are being showered with imposing quantities of new information. Along with traditional sources, many more data channels and...
- The Top Five Ways to Get Started with Big Data Despite the increased focus on big data over the past few years, most organizations are still talking about what big data is rather...
- Data Warehouse Augmentation: The Queryable Data Store While organizations have, to date, been busy exploring and experimenting, they are now beginning to focus on using big data technologies to solve...
- The IBM Big Data Platform IBM is unique in having developed an enterprise class big data platform that allows you to address the full spectrum of big data...
- Live Webcast Best Practices: How to Improve Business Continuity with Virtualization VMware solutions include a range of business continuity capabilities to help ensure availability for applications across your virtualized environment. Learn More>>
- Endpoint Data Management: Protecting the Perimeter of the Internet of Things Not surprisingly, "Internet of Things" (IoT) and Big Data present new challenges AND opportunities for enterprise IT. Teams need to harness, secure and...
- Best Practices: How to Improve Business Continuity with Virtualization VMware solutions include a range of business continuity capabilities to help ensure availability for applications across your virtualized environment. Learn More>> All Data Center White Papers | Webcasts