DO NO HARM?

The problem with the trolley problem

… especially when it comes to AI.
… especially when it comes to AI.
Image: AP Photo/Francisco Seco
We may earn a commission from links on this page.

Stop me if you’ve heard this one before.

Imagine you’re driving a trolley car. Suddenly the brakes fail, and on the track ahead of you are five workers you’ll run over. Now, you can steer onto another track, but on that track is one person who you will kill instead of the five: It’s the difference between unintentionally killing five people versus intentionally killing one. What should you do?

Philosophers call this the “trolley problem,” and it seems to be getting a lot of attention these days—especially how it relates to autonomous vehicles. A lot of people seem to think that solving this thorny dilemma is necessary before we allow self-driving cars onto our roads. How else will they be able to decide who lives or dies when their algorithms make split-second decisions?

I can see how this happened. The trolley problem is part of almost every introductory course on ethics, and it’s about a vehicle killing people. How could an “ethical” self-driving car not take a view on it, right? 

However, there’s just one problem: The trolley problem doesn’t really have anything to do with the ethics AI—or even driving. 

The original trolley problem came from a paper about the ethics of abortion, written by English philosopher Phillipa Foot. It’s the third of three increasingly complicated thought experiments she offers to help readers assess whether intentionally harming someone (e.g. choosing to hit them) is morally equivalent to merely allowing harm to occur (choosing not to intervene to stop them getting hit). As the trolley driver, you are not responsible for the failure of the brakes or the presence of the workers on the track, so doing nothing means the unintentional death of five people. However, if we choose to intervene and switch tracks, then we intentionally kill one person as a means of saving the other five.

This philosophical issue is irrelevant to self-driving cars because they don’t have intentions. Those who think intentions are morally significant tend to hold strong views about our freedom of will. But machines don’t have free will (at least not yet). Thus, as far back as Isaac Asimov’s three laws of robotics, we’ve recognized that for machines to harm someone—or “through inaction, allow a human being to come to harm”—is morally equivalent.

Without this distinction, the trolley problem has little interest for philosophers, nor should it. Sure, some people have used cases like this to uncover people’s beliefs about how to make trade-offs between human lives. However, such studies merely replicate social prejudices and say little that is deep or significant. (Young lives count for more than the elderly, for example, but thin people also count for more than fat ones.)

On the other hand, the trolley problem specifically disregards just about every aspect of ethical behavior that is relevant to self-driving cars.

For instance, the trolley runs on rails, so its driver knows for certain that they will hit either five people or one person. However, self-driving cars must navigate environments of profound uncertainty, where they must continuously guess how others will react. Any driver knows how important this is and will have come up with their own theories about driver and pedestrian behavior. How can we teach machines this level of understanding and empathy? Psychologists and computer scientists are working on it, but philosophers aren’t that interested.

The trolley also suffers a catastrophic brake failure, so that its driver faces no blame for the deaths that would result from his inaction. However, self-driving cars must continuously monitor their performance and decide what risk of suffering such a failure would be acceptable, and when to pull over and call in a breakdown team. National regulators have been determining such safety parameters for everything from track maintenance to novel pharmaceuticals for years, and they make some fascinating distinctions: Airplanes have to be much safer than cars, for instance. Philosophers are usually happy to let them get on with it.

Finally, the trolley driver (we can assume) will struggle with their choice, will feel sympathy and distress for those who die, and will be haunted by this incident for years to come. If they were to climb out of the wreckage proclaiming, “I did the right thing” and get on with their life, we would think they were some kind of monster. On one level, we might feel that self-driving cars, since they feel no emotions, are like the driver who could get back to work without a second thought. How could such impartial spectators to their actions ever be truly ethical? 

However, I think that is too quick. Moral philosophers like to think in terms of universal laws of behavior that tell everyone what to do, all the time. Self-driving cars, on the other hand, don’t work by following such rules. Modern AI, based on machine learning, is all about continually improving decision making to achieve the desired result (like safe and efficient transport) and avoid anything undesirable (like deaths). This can lead AI systems to function in unexpected ways, both brilliant and bizarre, and means that they will forever be learning from their mistakes and working out how to do better. What driver can say the same thing?

If we want to make ethical AI, we should be harnessing this ability and training self-driving cars to work in ways that we, as a society, can approve of and endorse. That is very unlikely to result from starting with philosophical first principles and working from there. Either philosophers of AI ethics like myself have to discard a lot of our theoretical baggage, or others should take on this work instead.