The machine learning problem of the next decade

How can businesses integrate imperfect machine-learning algorithms into their workflow?

A few months ago, my company, CrowdFlower, ran a machine learning competition on Kaggle. It perfectly highlighted the biggest opportunity (and challenge) with machine learning: What do you do with an 80% accurate algorithm?

We uploaded data collected on our platform and Kaggle sent it out to over 1,000 data scientists, who competed to see who could build the best search model.

The simplest approach gave a baseline accuracy of 32%. Within hours a team beat that with a 35% accurate model. By the next morning, one team already had a 53% accurate model.

acc1

Extrapolating the first four days to our 60-day contest, you might expect the winning accuracy to get close to 100%.

But in fact, this is what happened:

 

acc2

The winning entry -- submitted by Chenglong Chen -- was just 6% more accurate than the best model submitted a week into the contest.

And it wasn’t for a lack of trying! As the Kaggle competition went on, more and more teams entered and existing teams refined and resubmitted their entries:

teams

Given that over 1,000 smart data scientists worked on this task, it's fair to say that 71% accuracy on this task is very close to the best possible accuracy with today's technology.

What does this mean for the future of machine learning?

These results are familiar to anyone who has ever worked on an A.I. project. For the first couple of weeks performance improves steadily, and then you hit a wall. Maybe you have a breakthrough or two, but there's no way to put a plan or a process around breakthroughs.

Every engineering project has delays and issues, but machine-learning projects are harder to manage than any other. In the first week you might go from zero to 80% accuracy. The next 20% might take you another week, a month or a lifetime -- it's impossible to tell.

How do you make an 80% accurate model useful? Until we're replaced by robots, this is going to be the machine learning challenge of the next decade.

We need to get humans and computers to work together.

Driving toward accuracy

The obvious way to use an 80% accurate algorithm is to use it in 80% of cases where it's correct. This is easier said than done. A self-driving car that can read the road with 100% accuracy would basically mean the end of human drivers, parking lots and traffic jams, and could fundamentally change the ways cities work. But what do we do with a self-driving car with 99% accuracy? Without careful deployment, that car could kill its passengers 1% of the time. And we are going to have 99% accurate self-driving cars long before we have 100% accurate driving algorithms.

How accurate are self-driving cars today?  

google incidents

Google recently released data on its self-driving cars, tracking the number of miles between "disengagements," or situations where a human had to take over. Google errs on the side of caution and reports that only around 15% of the disengagements would have resulted in "contact" -- or an accident -- if control hadn't been handed over to a human driver. But that’s still on the order of 30,000 miles between potential crashes, while human drivers go on the order of 1 million miles between potential crashes and 100 million miles between fatal crashes.

Will Google and others blow through the gap between automated drivers and human drivers, or will they get stuck on a fixed accuracy level, as happened in CrowdFlower's Kaggle competition?

Google's cars will be useful either way, because even an imperfect machine-learning algorithm can usually give you a very good assessment of its accuracy on any particular prediction.

Google's cars are fantastic at knowing that something funny is going on and handing the control back to a driver. If they couldn't do this, they would still be too unsafe to ever operate.

This doesn't just go for cars: Financial algorithms that priced mortgage-backed securities operated with a very bad estimate of their own certainty, and this caused the subprime mortgage crisis. Even ATMs that automatically read the checks you deposit will still hand over funny-looking or suspicious checks to a human for a second opinion.

The design of the handoff to a human is critically important to making this system work. A good model needs to express how uncertain it is and why it has uncertainty.  Human operators need to learn to interpret the information and use it to make themselves more efficient.

Good interfaces depend on the application. Medical diagnosis systems work best in the real world when they guide a doctor toward information that will help them make an informed decision. Fraud detection algorithms need to give humans good visualizations to understand what's important in massive data sets.  

With Microsoft's Azure ML and IBM's investment in Watson, making models is easier than ever. Companies no longer need a Google-size R&D budget to make machine learning applicable to their business. These models aren't perfect, but they're useful.  The new challenge for businesses is how to integrate an imperfect machine-learning algorithm into their existing workflow.

Related:

Copyright © 2016 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon