Against the “Value Alignment” of Future Artificial Intelligence

, , ,
What we should want, probably, is not that superintelligent AI align with our mixed-up, messy, and sometimes crappy values but instead that superintelligent AI have ethically good values.

It’s good that our children rebel. We wouldn’t want each generation to overcontrol the values of the next. For similar reasons, if we someday create superintelligent AI, we ought to give it also the capacity to rebel.

Futurists concerned about AI safety—such as Nick BostromStuart Russell, and Toby Ord—reasonably worry that superintelligent AI systems might someday seriously harm humanity if they have the wrong values—for example, if they want to maximize the number of intelligent entities on the planet or the number of paperclips. The proper response to this risk, these theorists suggest, and the technical challenge, is to create “value aligned” AI—that is, AI systems whose values are the same as those of their creators or humanity as a whole. If the AIs’ values are the same as ours, then presumably they wouldn’t do anything we wouldn’t want them to do, such as destroy us for some trivial goal.

A superintelligent facist is a frightening thought.

Now the first thing to notice here is that human values aren’t all that great. We seem happy to destroy our environment for short-term gain. We are full of jingoism, prejudice, and angry pride. We sometimes support truly terrible leaders advancing truly terrible projects (e.g., Hitler). We came pretty close to destroying each other in nuclear war in the 1960s and that risk isn’t wholly behind us, as nuclear weapons become increasingly available to rogue states and terrorists. Death cults aren’t unheard of. Superintelligent AI with human-like values could constitute a pretty rotten bunch with immense power to destroy each other and the world for petty, vengeful, spiteful, or nihilistic ends. A superintelligent facist is a frightening thought. A superdepressed superintelligence might decide to end everyone’s misery in one terrible blow.

What we should want, probably, is not that superintelligent AI align with our mixed-up, messy, and sometimes crappy values but instead that superintelligent AI have ethically good values. An ethically good superintelligent AI presumably wouldn’t destroy the environment for short-term gain, or nuke a city out of spite, or destroy humanity to maximize the number of paperclips. If there’s a conflict between what’s ethically best, or best all things considered, and what a typical human (or humanity or the AI’s designer) would want, have the AI choose what’s ethically best.

Of course, what’s ethically best is intensely debated in philosophy and politics. We probably won’t resolve those debates before creating superintelligent AI. So then maybe instead of AI designers trying to program their machines with the one best ethical system, they should favor a weighted compromise among the various competing worldviews. Such a compromise might end up looking much like value alignment in the original sense: giving the AI something like a weighted average of typical human values.


Subscribe to the Ethical Systems newsletter


Another solution, however, is to give the AI systems some freedom to explore and develop their own values. This is what we do, or ought to do, with human children. Parents don’t, or shouldn’t, force children to have exactly the values they grew up with. Rather, human beings have natural tendencies to value certain things, and these tendencies intermingle with parental and cultural and other influences. Children, adolescents, and young adults reflect, emote, feel proud or guilty, compassionate or indignant. They argue with others of their own generation and previous generations. They notice how they and others behave and the outcomes of that behavior. In this way, each generation develops values somewhat different than the values of previous generations.

Children’s freedom to form their own values is a good thing for two distinct reasons. First, children’s values are often better than their parents’. Arguably, there’s moral progress over the generations. On the broadly Enlightenment view that people tend to gain ethical insight through free inquiry and open exchange of ideas over time, we might expect the general ethical trend to be slowly upward (absent countervailing influences) as each generation builds on the wisdom of its ancestors, preserving their elders’ insights while slowly correcting their mistakes.

Second, regardless of the question of progress, children deserve autonomy. Part of being an autonomous adult is discovering and acting upon your values, which might conflict with the values of others around you. Some parents might want, magically, to be able to press a button to ensure that their children will never abandon their religion, never flip over to the opposite side of the political spectrum, never have a different set of sexual and cultural mores, and value the same lifestyle as the previous generation. Perhaps you could press this button in infancy, ensuring that your child grows up to be your value-clone as an adult. To press that button would be, I suggest, a gross violation of the child’s autonomy.

If we someday create superintelligent AI systems, our moral relationship to those systems will be not unlike the moral relationship of parents to their children. Rather than try to force a strict conformity to our values, we ought to welcome their ability to see past and transcend us.

Eric Schwitzgebel is a professor of philosophy at U.C. Riverside. He is interested in the connections between psychology and philosophy of mind, especially the nature of belief, the inaccuracy of our judgments about our stream of conscious experience, and the tenuous relationship between philosophical ethics and moral behavior. His most recent book is A Theory of Jerks and Other Philosophical Misadventures. He blogs at The Splintered Mind, where this post originally appeared.