What's the big deal about Hadoop?

Customers love it, but it requires training and an advanced grasp of analytics

1 2 3 Page 2
Page 2 of 3

With Hadoop, Concurrent engineers found that they could handle the growing needs of their clients, he says. "During testing they tried processing two billion records a day for the customer, and by adding another server to the node we found we could complete what they needed and that it scaled immediately," Lazzaro says.

The company ran the same tests using traditional databases for comparison and found that one of the key benefits of Hadoop was that additional hardware could easily and quickly be added on as needed without requiring extra licensing fees because it is open source, he says. "That became a differentiator," Lazzaro says.

Another Hadoop user, life sciences and genomics company NextBio, of Santa Clara, Calif., works on projects involving huge data sets for human gene sequencing and related scientific research.

Satnam Alag
Satnam Alag, vice president of engineering for NextBio, says Hadoop allows his shop to perform "mass analytics on huge amounts of public data."

"We bring in all kinds of genomics data, then curate it, enrich it and compare it with other data sets" using Hadoop, says Satnam Alag, vice president of engineering for NextBio. "It allows mass analytics on huge amounts of public data" for their customers, which range from pharmaceutical companies to academic researchers. NextBio uses a Hadoop distribution from MapR.

A typical full genome sequence can contain 120GB to 150GB of compressed data, requiring about half a terabyte of storage for processing, he says. In the past, it would take three days to analyze it, but with 30 to 40 machines running Hadoop, NextBio's staff can do it now in three to four hours. "For any application that has to make use of this data, this makes a big difference," Alag says.

Another big advantage is that he can keep scaling the system up as needed by simply adding more nodes. "Without Hadoop, scaling would be challenging and costly," he says. This so-called horizontal scaling -- adding more nodes of commodity hardware to the Hadoop cluster -- is a "very cost-effective way of scaling our system," Alag explains. The Hadoop framework "automatically takes care of nodes failing in the cluster."

That's dramatically changed the way the company can expand its computing power to meet its needs, he says. "We don't want to spend millions of dollars on infrastructure. We don't have that kind of money available."

Allows for new types of applications

One huge benefit of Hadoop is its ability to be able to analyze huge data sets to quickly spot trends, Lazzaro says. For a major retailer, that could mean scouring Facebook or Twitter user data to learn what scarf colors were in fashion last season, to be able to compare that information with today's hot color trends to help determine what will sell this season.

"It gives you the ability to look back in time to look for opportunities for new sales," Lazzaro says. This plays out at Concurrent when the firm analyzes a commercial or ad for a car dealership. "We can look at the data to see who's watched the commercials; then you might have a targeted sales lead you can leverage to make a sale. You don't always know what you are looking for."

Hugh Williams
Hadoop "has really changed the landscape for us," says Hugh Williams, vice president of experience, search and platforms at eBay.

Traditional databases can work for many sorting and analysis needs, but with ultra-large data sets, Hadoop can be a much more efficient way to find things, Lazzaro says. "It's really built for handling that."

For their part, eBay's engineers "like being able to work with unstructured data ... and build new products for eBay quickly," Williams says. Because eBay engineers can access the firm's 300 million listings, historical information and vast amounts of related information, Williams says, "this allows us to understand customers and build experiences they want." It's not really about the structured versus unstructured issue; rather, "it's about our engineers being able to roll up their sleeves and work with our data like never before," he says.

In the last year, eBay has done "some really amazing things with Hadoop, including improvements in merchandising, buyer experience and how customers use the site," Williams says.

During the year, for instance, eBay staffers can see when customers start typing in Halloween queries and Christmas queries. "With that I can tell you the kinds of things people are looking for. We didn't comprehend this use of the data five years ago -- not at all."

1 2 3 Page 2
Page 2 of 3
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon