Stuart Russell wrote the textbook on AI - now he wants to save us from catastrophe

In Harlan Ellison's short story 'I Have No Mouth, and I Must Scream', an 'Allied Mastercomputer' - AM - gains sentience and exterminates the human race save for five people. The AI has developed a deep malice for its creators, and these survivors are kept alive to be punished with torture through endless inescapable simulations.

The story was first published in 1968. The existential arguments against playing God have long been of interest in the world of fiction, religion, and philosophy. However, now organisations with little limit to their capital or ambition are racing to create a general artificial intelligence system, underpinned by the belief that the 'singularity'- an AI becomes smarter than a human - will bring with it benefits to all of humankind.

Several decades ago, Stuart Russell co-authored 'Artificial Intelligence: A Modern Approach', which quickly became a staple textbook for students studying AI. Russell is now professor of computer science at the University of California and is warning that civilization needs to take urgent steps to avoid sleepwalking into potentially world-ending catastrophes.

"I wrote my first AI test program when I was in school," says Russell, speaking with Techworld at the IP Expo conference in London's Docklands in late September. "It's always seemed to me that AI is an incredibly important problem and if we solve it it's going to have a huge impact."

A question no one seemed to be answering is: What if we succeed? Russell has publicly wondered about this since the first edition of his textbook was published in 1994. "Since we're all trying to get there, we should ask what happens when we get there," he says.

Businesses and academics are busily working away on the development of artificial intelligence, and while a better-than-human general AI might be decades or more away, a system that is smarter than us could very well take us by surprise.

What do we want?

"Things that are more intelligent than you can get what they want. It doesn't really matter what you want. Just like the gorillas don't get what they want any more: the humans get what they want. And so, how do we prevent that problem?"

One way to address this is to simply pull the plug on the machine, but if a system is smarter than you, it will have probably already considered that.

"I think the way we prevent that problem is by designing machines in such a way that constitutionally the only thing they want is what we want," Russell says. "Now the difficulty with that is we don't know how to say what we want. We're not even sure that we know what we want in a way we can express - so we can't just put it into the machine.

"That means the machines are going to be designed to want only want we want. But they don't know what it is."

Russell, then, is "exploring the technical consequences of that way of thinking". As you might imagine that creates more and more questions, both mathematical and existential in nature.

"Really what it means is the machines have to learn from all possible sources of information what it is that humans really want," he says. "What would make them unhappy? What would constitute a catastrophic outcome? We can imagine outcomes that we would say: yes, that's definitely catastrophic.

"I definitely don't want all humans to be subject to guinea pig experiments on cancer drugs.

"I definitely don't want all of the oxygen in the atmosphere to be eliminated and everyone asphyxiated."

Others would be harder to anticipate: for example, a "gradual enfeeblement" where Wall-E-like machines keep us "fat, stupid, lazy, obese, and useless".

"That'd be something where now we could say, we definitely don't want that," he says. "But we could go down that slippery slope where we thought that was great.

"So it's a very complicated set of questions and it really does involve these philosophical issues. What are human preferences? Do you mean what you believe you will prefer in the future, or do you mean what you actually prefer at the time?"

The Midas Problem

In Greek mythology, when King Midas first gained the power to turn any object he touched to gold he was in a state of greedy euphoria, but he quickly came to curse it.

This story of the 'Midas problem' has endured in some form or another for centuries with good reason - in short, be careful what you wish for.

The crucial point, Russell says, is that humans do not try to define what it is a machine does but instead infer from the choices we make.

"We absolutely want to avoid trying to write down human preferences because those are the problem," Russell says. "If we get it wrong, which we invariably will, we have a single-minded machine that's pursuing an objective that we gave it and it's going to pursue it until it achieves it."

"If it's got the wrong objective, we get what we said we wanted but we'll be extremely unhappy about it," he says. "A machine that learns about human preferences by behaviour has an incentive to ask questions.

"Is it okay if we run some experiments on cancer drugs on all these people? No. That's not okay. You would have to volunteer and you would probably have to pay them, and we'd do it the way we normally do it.

"When a machine believes that it knows what the objective is, it has no incentive to ask questions, to say: is it okay if I do this or do you want it this way or this way? It has no reason to do that. This is a whole new area of research.

"I'm reasonably optimistic we can make this work, and make systems we can prove mathematically will leave us happier that we built them that way."

How do you like your limbs?

To achieve this it will be necessary to retreat from the broader philosophical questions and think more simply.

Russell asks: "When I say human values I mean something very simple: which life would you like, a life with or without your left leg?"

The answer seems obvious but machines don't know much about us: they don't know that we appreciate our limbs, they don't know we'd rather not go hungry, and they don't know that we tend, generally speaking, to like being alive.

"What we are trying to avoid is that machines behave in such a way it violates these basic human preferences," says Russell. "We are not trying to build an ideal value system. There is no ideal value system. Your preferences are different from mine, our preferences are way different from someone who grew up in a different culture altogether.