What's the big deal about Hadoop?

Customers love it, but it requires training and an advanced grasp of analytics

1 2 3 Page 3
Page 3 of 3

Be careful out there

As good as Hadoop is, there are some cautions. First, "don't commit to or standardize on one vendor quite yet," because it's such a "turbulent" space right now, Forrester's Kobielus suggests. "The vendors are all continuing to rapidly evolve." On the other hand, that does create a "vibrant ecosystem," he says.

Marcus Collins, an analyst at Gartner, says it's up to the enterprise to get the expertise needed to get the most out of Hadoop. "It's asking for a level of analytics capabilities that many companies don't have today," he says. "You need to train your staff and invest in analytics, and that will put you in the best position to exploit this technology."

Another key consideration: Most shops will need to hire Hadoop specialists, who are in short supply, or will need to train in-house staffers. "It's not trivial to use," eBay's Williams says. "So we've put a lot of training in place so our engineers know how to use Hadoop and can write code. You're going to have to invest in your developers and program manager so they can become proficient users. Don't underestimate that."

Hadoop

Also be prepared for an organizational learning curve in terms of relying on an open-source system for a mission-critical application. Using it for a few under-the-radar kinds of projects is one thing, but it's another entirely to develop a massive system for all the world to see. Best be prepared to educate your management about the benefits of open source.

Another tip from Collins is to stay "intimately involved" with the project to make sure it goes as planned. "Don't just give your problems to your Hadoop vendor," he says. At the end of the day, "you're going to be running it."

Also, Kobielus explains, best practices with Hadoop are still evolving, so it's best to figure out some short-term benefit you might get from the system and avoid anything too long-term to start. As you build up expertise, you can figure out more things to do with the software. In the meantime, the range of approaches that early adopters are using to build out and scale their clusters "are all over the board," he says.

Adds to, doesn't replace, other databases

Most customers are using Hadoop in addition to, not instead of, other types of software. At eBay, for instance, the company still uses relational databases as well as does "a lot of custom [database] work," Williams explains. "At eBay, we see value in using multiple technologies to work with our data. Hadoop is a terrific choice for certain uses, while other technologies work alongside it for other purposes."

For example, when it comes to transactions, "it makes total sense to use a relational database system," he says. But overall the idea is to remain "flexible in what technologies we use at eBay; we don't see a world where there will be one unifying technology."

The same is true at Concurrent. Hadoop hasn't replaced the company's use of traditional relational databases, including MySQL, PostgreSQL and Oracle. "It is a combined solution," Lazzaro says. "We use Hadoop to do the heavy lifting, such as large-scale data processing. We then use Map/Reduce within Hadoop to create summary data that is easily accessible through a traditional RDBMS."

What tends to happen in relational databases, he explains, is that when the system gets too large -- to, say, 250 million records a day -- the database becomes "non-responsive to data queries." "However," he says, "Hadoop at that scale is not even breaking a sweat. Hadoop therefore can store, say, 5 billion records and with Map/Reduce we can create a summary of that data and insert it into a standard RDBMS for quick access."

In general, Williams says, "I don't think too much" about Hadoop's limitations. "I think about the opportunities. You can find solutions to any problems pretty quickly" through the open source community. "Some people do gripe about different aspects of Hadoop, but it's a reasonably new thing. It's like Linux was back in 1993 or 1994."

"We do see unique technology challenges at our scale and with our extreme data," Williams explains, among them architecting data centers, designing a network to support Hadoop and choosing the right hardware.

Overall, Hadoop has been a good strategy for eBay, Williams says. "For us it's been an absolute game changer. It's what our engineers want to use and it's really helped us become a really data-driven company."

Todd R. Weiss is an award-winning technology journalist and freelance writer who worked as a staff reporter for Computerworld.com from 2000 to 2008. Follow him on Twitter, where his handle is @TechManTalking, or e-mail him at toddrweiss@gmail.com.

Copyright © 2012 IDG Communications, Inc.

1 2 3 Page 3
Page 3 of 3
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon