In artificial intelligence, an alignment problem arises when a system’s behaviour doesn’t uphold the intended values of its designers, particularly when AI systems are used to solve sticky social problems.
I study machine-human social relationships, mainly in collaboration with artificially-intelligent robots. In the course of my work I’ve observed how humans relate to machines on a social level, and how people describe machine relationships that are very similar to relationships with other humans: intimate, friendly, fearful, antagonistic, you name it. My great-great-aunt’s typewriter is itself a member of the family. I’ve held years-long grudges against uncooperative doors. Common sense might say that machines are inert, cold, unfeeling things that aren’t really deserving of social consideration. But this prevailing cultural attitude doesn’t always match what people encounter in their daily lives. And when it comes to designing and deploying machines, AI or otherwise, there is always a social aspect that makes the result unpredictable.
We produce meaning together
I do my best to respect and understand machines on their terms, rather than expecting them to be like humans. However, I can’t pretend that I encounter them in a vacuum. As a designer I have an intimate relationship with the machines I design, both in terms of my intentions for them and in terms of the culture of machine-human encounters in which we live. I only ever learn the true meaning of machines after they enter the world, by reflecting on their interactions with myself and other humans. We produce meaning together, regardless of my intentions as a designer, in the messiness of everyday life. Out in the world, entities are constantly de- and re-contextualized in relation to their context. There can never be a guarantee that a designer’s intentions, or indeed their values, will be maintained.This isn’t necessarily a bad thing. In the field of new musical instrument design it’s well-known that a performer will seldom “colour within the lines” of an instrument’s intended function. Part of this comes from contemporary musical practices of extended techniques (such as unscrewing trumpet valves while playing) and part of it comes from the novelty and lack of established pedagogy for new instruments. Either way, artistic expression can always go beyond the bounds of any rule set, any expectation. I would even say that it’s desirable for a musical instrument to inspire unexpected behaviours. Similarly, autonomous machines (artificially-intelligent or otherwise) in contemporary art are often valued precisely for their potential to produce unexpected results. There is a machine-human collaboration at work to discover what is possible and to find meaning. Nothing can go exactly as planned. This can of course go badly, as with Microsoft’s chatbot “Tay”, who had to be shut down when users taught the bot to produce inflammatory and offensive content.
Can we make a digital god?
So why should I expect a machine to perfectly embody the values that I hold dear? And why should I want a machine to preserve my current values when I myself am constantly growing and changing in the world? It seems to me that “the alignment problem” is a problem that comes with trying to offload moral questions onto machine systems. Can we make a machine that ensures fairness in corporate hiring practices? Can we make a machine that fixes racial bias in law enforcement? Can we make a machine to govern us with benevolence towards prosperity for all? What about a machine to reach spiritual enlightenment on our behalf? Or, regarding the quest for “general” artificial intelligence, can we create an omniscient, omnipotent machine? Can we make a digital god? These kinds of questions are, to me, fundamentally misguided.Relying on machines to solve our problems amounts to an abdication of responsibility. They might be able to help us realize certain dreams in collaboration, but it isn’t guaranteed nor should it be expected that their behaviour will autonomously match our intentions. A spiritual robot or a digital god only exists within a larger machine-human system that maintains it.
AI is not a magic wand that can “solve” social problems
Up until now I have been using the word machine in a sense that implies it is an artificial, entirely non-human type of entity. However, machine is a manifold concept that can include humans. For example, "social machine" is a term for distributed machines comprising both artificial and biological (human) components, like organizations and bureaucracies. So we are already and have always been designing machines to address messy social problems, to reach for enlightenment, to realize collective values and dreams. They are messy machines, machines full of contradictions and flaws. With which values, whose values, do social machines align? Surely the labour of alignment is just as messy, a struggle between different values, different dreams, different humans and machines. Even with the most rigorous and carefully-implemented design, the outcomes of a complex system cannot be fully predicted.I’m not here to deny the wide-reaching consequences of implementing AI systems in society. There is a tremendous responsibility in deploying such systems. But ultimately, they are components of larger social machines and they require continuous and collaborative alignment-labour to promote a given set of values. An alignment problem is something that exists not just in a specific AI product, but in the machinery that produces it. In other words, AI is not a magic wand that can “solve” social problems. For an AI system to embody or promote a set of values, these values must already be present in the process of design. This means applying those values to every step of the process—data collection, training, testing, deployment—and crucially, to continue to apply them in maintenance and redesign. Alignment problems are not only discrete but systemic.
So when you hear a billionaire or a pundit describing their hopes and fears around AI, remember that they are doing work to align our collective efforts with their values. A machine may embody a given value-set when it enters the world, but that is only part of the story. The work of alignment is continuous, collaborative and never complete. Can the alignment problem be solved? I think not, unless society itself can be “solved”. No matter how sophisticated an artificial intelligence is, no matter how perfect its creators claim it to be, it will never behave as intended. There will always be room for realignment.