As Russell sees it, today’s goal-oriented AI is ultimately limited, for all its success at accomplishing specific tasks like beating us at Jeopardy! and Go, identifying objects in images and words in speech, and even composing music and prose. Asking a machine to optimize a “reward function” — a meticulous description of some combination of goals — will inevitably lead to misaligned AI, Russell argues, because it’s impossible to include and correctly weight all goals, subgoals, exceptions and caveats in the reward function, or even know what the right ones are. Giving goals to free-roaming, “autonomous” robots will be increasingly risky as they become more intelligent, because the robots will be ruthless in pursuit of their reward function and will try to stop us from switching them off.
Instead of machines pursuing goals of their own, the new thinking goes, they should seek to satisfy human preferences; their only goal should be to learn more about what our preferences are. Russell contends that uncertainty about our preferences and the need to look to us for guidance will keep AI systems safe. In his recent book, Human Compatible, Russell lays out his thesis in the form of three “principles of beneficial machines,” echoing Isaac Asimov’s three laws of robotics from 1942, but with less naivete. Russell’s version states:
- The machine’s only objective is to maximize the realization of human preferences.
- The machine is initially uncertain about what those preferences are.
- The ultimate source of information about human preferences is human behavior.
Over the last few years, Russell and his team at Berkeley, along with like-minded groups at Stanford, the University of Texas and elsewhere, have been developing innovative ways to clue AI systems in to our preferences, without ever having to specify those preferences.
These labs are teaching robots how to learn the preferences of humans who never articulated them and perhaps aren’t even sure what they want. The robots can learn our desires by watching imperfect demonstrations and can even invent new behaviors that help resolve human ambiguity. (At four-way stop signs, for example, self-driving cars developed the habit of backing up a bit to signal to human drivers to go ahead.) These results suggest that AI might be surprisingly good at inferring our mindsets and preferences, even as we learn them on the fly.
“These are first attempts at formalizing the problem,” said Sadigh. “It’s just recently that people are realizing we need to look at human-robot interaction more carefully.”
Whether the nascent efforts and Russell’s three principles of beneficial machines really herald a bright future for AI remains to be seen. The approach pins the success of robots on their ability to understand what humans really, truly prefer — something that the species has been trying to figure out for some time. At a minimum, Paul Christiano, an alignment researcher at OpenAI, said Russell and his team have greatly clarified the problem and helped “spec out what the desired behavior is like — what it is that we’re aiming at.”
How to Understand a Human
Russell’s thesis came to him as an epiphany, that sublime act of intelligence. It was 2014 and he was in Paris on sabbatical from Berkeley, heading to rehearsal for a choir he had joined as a tenor. “Because I’m not a very good musician, I was always having to learn my music on the metro on the way to rehearsal,” he recalled recently. Samuel Barber’s 1967 choral arrangement Agnus Dei filled his headphones as he shot beneath the City of Light. “It was such a beautiful piece of music,” he said. “It just sprang into my mind that what matters, and therefore what the purpose of AI was, was in some sense the aggregate quality of human experience.”
Robots shouldn’t try to achieve goals like maximizing viewing time or paper clips, he realized; they should simply try to improve our lives. There was just one question: “If the obligation of machines is to try to optimize that aggregate quality of human experience, how on earth would they know what that was?”