It's All About the Data
Machine learning is enabled by clever algorithms, of course, but what has driven it to prominence in recent years is the availability of huge amounts of data, both from the Internet and, more recently, from a proliferation of physical sensors. Carlos Guestrin, an assistant professor of computer science and machine learning at Carnegie Mellon University, combines sensors, machine learning and optimization to make sense of large amounts of complex data.
For example, he says, scientists at the University of Southern California and the University of California, Los Angeles, put sensors on robotic boats to detect and analyze destructive algae blooms in waterways. AI algorithms learned to predict the location and growth of the algae. Similarly, researchers at Carnegie Mellon put sensors in a local water-distribution system to detect and predict the spread of contaminants. In both cases, machine learning enabled better predictions over time, while optimization algorithms identified the best sites for the expensive sensors.
Guestrin is also working on a system that can search a huge number of blogs and identify those few that should be read by a given user every day based on that user's browsing history and preferences. He says it may sound completely different from the task of predicting the spread of contaminants via sensors, but it's not.
"Contaminants spreading through the water distribution system are basically like stories spreading through the Web," he says. "We are able to use the same kind of modeling ideas and algorithms to solve both problems."
Guestrin says the importance of AI-enabled tools like the blog filter may take on importance far beyond their ability to save us a few minutes a day. "We are making decisions about our lives -- who we to elect, and what issues we find important -- based on very limited information. We don't have time to make the kind of analyses that we need to make informed decisions. As the amount of information increases, our ability to make good decisions may actually decrease. Machine learning and AI can help."
Microsoft Research has combined sensors, machine learning and analysis of human behavior in a road traffic prediction model. Predicting traffic bottlenecks would seem to be an obvious and not very difficult application of sensors and computer forecasting. But MSR realized that most drivers hardly need to be warned that the interstate heading out of town will be jammed at 5 p.m. on Monday. What they really need to know is where and when anomalies, or "surprises," are occurring and, perhaps more important, where they will occur.
So MSR built a "surprise forecasting" model that learns from traffic history to predict surprises 30 minutes in advance based on actual traffic flows captured by sensors. In tests, it has been able to predict about 50% of the surprises on roads in the Seattle area, and it is in use now by several thousand drivers who receive alerts on their Windows Mobile devices.
Few organizations need to make sense of as much data as search engine companies do. For example, if a user searches Google for "toy car" and then clicks on a Wal-Mart ad that appears at the top of the results, what's that worth to Wal-Mart, and how much should Google charge for that click? The answers lie in an AI specialty that employs "digital trading agents," which companies like Wal-Mart and Google use in automated online auctions.
Michael Wellman, a University of Michigan professor and an expert in these markets, explains: "There are millions of keywords, and one advertiser may be interested in hundreds or thousands of them. They have to monitor the prices of the keywords and decide how to allocate their budget, and it's too hard for Google or Yahoo to figure out what a certain keyword is worth. They let the market decide that through an auction process."