Hadoop is ready for the enterprise, IT execs say
Big companies are using Hadoop systems in big projects, despite concerns about issues such as security.
Computerworld - Despite some lingering user concerns about security and other issues, Hadoop is ready for enterprise use, according to IT executives at the Hadoop World conference in New York earlier this month.
Larry Feinsmith, managing director of IT at JPMorgan Chase, told a keynote audience that the financial services firm has been increasingly using the open-source storage and data analysis framework for almost three years.
JPMorgan Chase still relies heavily on relational database systems for transaction processing, but it uses Hadoop technology for a growing number of purposes, including fraud detection, IT risk management and self service, Feinsmith said.
With over 150 petabytes of data stored online, 30,000 databases and 3.5 billion log-ins to user accounts, data is the lifeblood of JPMorgan Chase, Feinsmith said.
Hadoop's ability to store vast volumes of unstructured data allows the company to collect and store Web logs, transaction data and social media data. "Hadoop allows us to store data that we never stored before," he said.
The data is aggregated into a common platform for use in a range of customer-focused data mining and data analytics tools, Feinsmith said.
Meanwhile, eBay is using Hadoop technology and the Hbase database, which supports real-time analysis of Hadoop data, to build a new search engine for its auction site.
Hugh Williams, vice president of experience, search and platforms at eBay, said the new engine, code-named Cassini, will replace technology the company has used since the early 2000s. The update is needed in part to handle surging volumes of data.
He noted that eBay has more than 97 million active buyers and sellers and over 200 million items for sale in 50,000 categories. The site handles close to 2 billion page views, 250 million search queries and tens of billions of database calls daily, he added.
The company has 9 petabytes of data stored on Hadoop and Teradata clusters, and the amount is growing quickly, he said.
Williams said about 100 eBay engineers are working on the Cassini project, making it one of the company's largest development efforts.
The new engine, slated to go live next year, is expected to respond to user queries with results that are context-based and more accurate than those provided by the current system, he said.
Feinsmith warned that IT shops interested in Hadoop should be aware of potential security issues. And he explained that aggregating and storing data from multiple sources can create a slew of problems related to access control and data management, while raising questions about data entitlement and data ownership.
Feinsmith also listed other potential Hadoop drawbacks that users should be aware of before embarking on big projects.
For instance, he said the Hadoop marketplace is "very confusing," featuring an oft-changing slate of vendors, products and standards. In addition, skilled Hadoop engineers are scarce.
And Williams noted that related technologies, such as Hbase, are still somewhat immature, which raises questions about system stability.
But Hadoop has plenty of potential. Feinsmith said that IT workers at JPMorgan Chase are debating whether relational database technologies will evolve to meet the bank's emerging big data needs, or if Hadoop-based systems will become adept at transaction processing.
This version of this story was originally published in Computerworld's print edition. It was adapted from an article that appeared earlier on Computerworld.com.
Read more about Data Center in Computerworld's Data Center Topic Center.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Stock Shock: The effect of project and portfolio management on share price In this independent report, you'll see the intrinsic connection between long-term capital investment and short term market performance -- and how this can...
- In the Firing Line CEOs Are Increasingly Being Held Accountable; How susceptible is the CEO's reputation to poor performance across the project portfolio?
- The Brave New World of Customer-Centric Manufacturing The Unique Opportunity for Manufacturers to Better Understand their Consumers
- See the Possibilities Utilizing Data Visualization Do you simply want to collect data, or do you want to derive business insights from it? What if you could quickly and...
- Cloud Knowledge Vault Learn how your organization can benefit from the scalability, flexibility, and performance that the cloud offers through the short videos and other resources...
- Testimonial: Cystic Fibrosis Trust Peter Hawkins, the Head of IT for Cystic Fibrosis Trust, discusses the role CommVault's Simpana software platform plays in improving the company's information... All Data Center White Papers | Webcasts