Hadoop hype and data Yodas: Tales from Predictive Analytics World

Here's a quick rundown of interesting observations, presentations and comments heard at this week's Predictive Analytics World and eMetrics conferences in San Francisco.

Hadoop is the symptom

This year the hot topic in analytics is Hadoop -- no surprise there. Everyone on the exhibitors' floor seemed to have a "Hadoop solution." But Hadoop is not a solution, says Fiona McNeill, product marketing manager at SAS Institute. "It's a symptom, not the cure. People are storing stuff on Hadoop just because they can." She thinks that's going to catch up with organizations, many of which are already struggling to process mountains of data. In the next two to four years, she predicts, "These analytics conferences will all be about data governance."

Data Scientists: High on the hype cycle

What is a data scientist, exactly? "A data scientist is a data analyst who lives in California,” quipped George Roumeliotis, data scientist for business intelligence optimization at Intuit. He's only half joking. The point, he says, is that you don't need someone with a fancy title to do what is often fairly straightforward work -- data tabulation, linear regression, decision trees, logistic regression and so on. "Data science is overhyped right now. You can achieve a lot with simple tools and methodologies, a little imagination and business insight."

Text analytics: Now tell us how you really feel

"In structured data [such as check-box surveys] people rate how they feel higher than what it actually is. In a free format, commentary are less positive," says Richard Foley, product manager, text analytics at SAS Institute.

Got Yoda?

Analytics relies on access to data. But who owns the various systems and data respositories in a large organization -- and what data is usable? Every analytics team needs to seek out their "data Yoda," someone who knows all of the different systems and how data traverses them, says Dylan Lewis, Web business analyst with the consumer group at Intuit. Intuit's Yoda knows how each system works. She doesn't have all the answers, he says, but "she knows who to call."

Facebook privacy settings no roadblock for marketers

"Social media is a free source of data that wasn't available before," says Richard Foley, product manager, text analytics at SAS Insitutute. Facebook, however, is different from services like Twitter in that its privacy settings can limit what data marketers can get. "If they say 'I only want this visible to friends' you can’t get their comments," he says. But that turns out not to be much of a challenge. "Most people just make everything public on Facebook," he says.

TurboTax Online uses analytics to help close the deal...

"With TurboTax we only get paid when you hit the submit button," says Lewis at Intuit. In a half billion dollar business, even tiny reductions in the abandonment rate are worth millions in incremental revenue. So Intuit is trying to predict, in real time, whether a customer is at imminent risk of abandoning. "At that point can offer live help," he says.

...and it does so by conducting experiments while the bus is moving

Making changes to the TurboTax website at the height of tax season sounds like a very bad idea. But that's exactly what Intuit does. "We use predictive analytics to answer questions. We segment traffic, design experiments, collect the data, analyze and recommend." The Intuit analytics team will make 500 different changes over the course of the 2 ½ month tax season, and during that time it may perform up to 70 different tests in any given week.

"This is how you do it. Do as many optimization experiments as you can to make the experience better," says Lewis.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon