Don’t trust AI until we build systems that earn trust

Progress in artificial intelligence belies a lack of transparency that is vital for its adoption, says Gary Marcus, coauthor of “Rebooting AI”

Dec 18th 2019

By K.N.C.

To judge from the hype, artificial intelligence is inches away from ripping through the economy and destroying everyone’s jobs—save for the AI scientists who build the technology and the baristas and yoga instructors who minister to them. But one critic of that view comes from within the tent of AI itself: Gary Marcus.

From an academic background in psychology and neuroscience—rather than computer science—Mr Marcus has long been an AI gadfly. He relishes poking holes in the popular AI technique of deep-learning because of its inability to perform abstractions even as it does an impressive job at pattern-matching. Yet his unease with the state of the art didn’t prevent him from advancing the art with his own AI startup, Geometric Intelligence, which he sold to Uber in 2016.

Mr Marcus argues that it would be foolish of society to put too much stock in today’s AI techniques since they are so prone to failures and lack the transparency that researchers need to understand how algorithms reached their conclusions. In classic statistics, the parameters used are determined by people, yet with AI, the system itself decides. Though the techniques work—say, identifying that a cell biopsy is cancerous—it’s unclear why it works. This makes it tricky to deploy AI in areas like medicine, aviation, civil-engineering, the judiciary and so on.

This is a point Mr Marcus makes with verve and pith in his latest book, “Rebooting AI” (Pantheon, 2019), which he co-wrote with Ernest Davis. As part of The Economist’s Open Future initiative, we asked Mr Marcus about why AI can’t do more, how to regulate it and what teenagers should study to remain relevant in the workplace of the future. The brief interview appears after an excerpt from the book on the need to build trustworthiness into artificial intelligence.

* * *

In AI we trust?

From “Rebooting AI” (Pantheon, 2019) by Gary Marcus and Ernest Davis.

Gods always behave like the people who make them.

—Zora Neale Hurston, Tell my Horse

It is not nice to throw people.

—Anna’s advice to the snow giant Marshmallow in Jennifer Lee’s 2013 Disney film Frozen

Machines with common sense that actually understand what’s going on are far more likely to be reliable, and produce sensible results, than those that rely on statistics alone. But there are a few other ingredients we will need to think through first.

Trustworthy AI has to start with good engineering practices, mandated by laws and industry standards, both of which are currently largely absent. Too much of AI thus far has consisted of short-term solutions, code that gets a system to work immediately, without a critical layer of engineering guarantees that are often taken for granted in other field. The kinds of stress tests that are standard in the development of an automobile (such as crash tests and climate challenges), for example, are rarely seen in AI. AI could learn a lot from how other engineers do business.

For example, in safety-critical situations, good engineers always design structures and devices to be stronger than the minimum that their calculations suggest. If engineers expect an elevator to never carry more than half a ton, they make sure that it can actually carry five tons. A software engineer building a website that anticipates 10 million visitors per day tries to make sure its server can handle 50 million, just in case there is a sudden burst of publicity.

Failing to build in adequate margins often risks disaster; famously, the O-rings in the Challenger space shuttle worked in warm weather but failed in a cold-weather launch, and the results were catastrophic. If we estimate that a driverless car’s pedestrian detector would be good enough if it were 99.9999 percent correct, we should be adding a decimal place and aiming for 99.99999 percent correct.

For now, the field of AI has not been able to design machine-learning systems that can do that. They can’t even devise procedures for making guarantees that given systems work within a certain tolerance, the way an auto part or airplane manufacturer would be required to do. (Imagine a car engine manufacturer saying that their engine worked 95 percent of the time, without saying anything about the temperatures in which it could be safely operated.)

The assumption in AI has generally been that if it works often enough to be useful, then that’s good enough, but that casual attitude is not appropriate when the stakes are high. It’s fine if autotagging people in photos turns out to be only 90 percent reliable—if it is just about personal photos that people are posting to Instagram—but it better be much more reliable when the police start using it to find suspects in surveillance photos. Google Search may not need stress testing, but driverless cars certainly do.

Good engineers also design for failure. They realize that they can’t anticipate in detail all the different ways that things can go wrong, so they include backup systems that can be called on when the unexpected happens. Bicycles have both front brakes and rear brakes partly to provide redundancy; if one brake fails, the second can still stop the bike. The space shuttle had five identical computers on board, to run diagnostics on one another and to be backups in case of failure; ordinarily, four were running and the fifth was on standby, but as long as any one of the five was still running, the shuttle could be operated.

Similarly, driverless car systems shouldn’t just use cameras, they should use LIDAR (a device that uses lasers to measure distance) as well, for partial redundancy. Elon Musk claimed for years that his Autopilot system wouldn’t need LIDAR; from an engineering standpoint, this seems both risky and surprising, given the limitations on current machine-vision systems. (Most major competitors do use it.)

And good engineers always incorporate fail-safes—last ditch ways of preventing complete disaster when things go seriously wrong—in anything that is mission-critical. […] Good engineers also know that there is a time and a place for everything; experimentation with radically innovative designs can be a game changer when mapping out a new product, but safety-critical applications should typically rely on older techniques that have been more thoroughly tested. An AI system governing the power grid would not be the place to try out some hotshot graduate student’s latest algorithm for the first time. […]

The challenges don’t end there. Once a new technology is deployed, it has to be maintained; and good engineers design their system in advance so that it can easily be maintained. Car engines have to be serviceable; an operating system has to come with some way of installing updates. This is no less true for AI than for any other domain. […]

Finally AI scientists must actively do their best to stay far away from building systems that have the potential to spiral out of control. For example, because consequences may be hard to anticipate, research in creating robots that could design and build other robots should be done only with extreme care, and under close supervision. As we’ve often seen with invasive natural creatures, if a creature can reproduce itself and there’s nothing to stop it, then its population will grow exponentially.

Opening the door to robots that can alter and improve themselves in unknown ways opens us to unknown danger. Likewise, at least at present, we have no good way of projecting what full self-awareness for robots might lead to. AI, like any technology, is subject to the risk of unintended consequences, quite possibly more so, and the wider we open Pandora’s box, the more risk we assume. We see few risks in the current regime, but fewer reasons to tempt fate by blithely assuming that anything that we might invent can be dealt with.

____________

Excerpted from “Rebooting AI: Building Artificial Intelligence We Can Trust”, by Gary Marcus and Ernest Davis. Copyright © 2019 by Gary Marcus. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher, Pantheon, a division of Penguin Random House.

* * *

An interview with Gary Marcus

The Economist: Your call for trustworthy AI would seem to entail new rules and institutions, akin to the IATA and the FAA for aviation, or the ITU and FCC for telecoms—abbreviations that basically mean big, institutional bureaucracies on a national and international scale. Is that what you want?

We definitely need something. Right now, for example, there are few regulations on what a driverless car company might release; they could be sued after the fact, but they can put on the road essentially anything they like. Drugs are much more regulated, with lengthy trials processes and so forth. Driverless cars could eventually save many lives, but, as Missy Cummings of Duke University has pointed out, until the technology is mature, we need to proceed with the kind of caution we would accord a new drug.

In the long term, we're going to have to mandate some set of innate values as well. A domestic robot à la Rosie the Robot, for example, will need to recognise the value of human life before it leaves the factory. We can't just leave values to chance, depending on what some system happens to encounter in the world, and what its so-called “training database” is.

The Economist: Why is it so hard to build causality and counterfactuals into our AI systems?

Current systems are too superficial. The leading mechanisms (eg, deep learning) discern correlation after correlation, from giant data sets, but at a very shallow level. A deep-learning system might learn to recognise an elephant by noticing the textures of the elephant's wrinkly skin, but miss the significance of the trunk. We haven't yet figured out how to build systems that can both learn from large scale databases (as deep learning does) yet also represent rich, articulated knowledge in the ways that earlier "classical AI" approaches sought to do.

You can't reason about what an elephant is likely to do if you don't understand what an animal is or what a trunk is. Categorising an image is not just the same thing as reasoning about what would happen if you let an angry elephant loose in Times Square. There's a versatility to human thought that just can't be captured by correlation alone.

The Economist: Should the fact that the most sophisticated and best-performing AI can't explain how it arrived at its conclusions make us reject it for critical uses (in medicine, power grids, etc)? Or should we tolerate this so long as we can validate that it works well?

We shouldn't just be sceptical of current AI because it can't explain its answers, but because it simply isn't trustworthy enough.

And validation is actually the key to that: we don't yet have sound tools for validating machine learning; instead, we mostly have something that is closer to a seat-of-the-pants method: we try stuff in a bunch of circumstances and if it works there, we hope that we are good to go—but we often have little guarantee that what works in ordinary circumstances will also work in extraordinary circumstances. An autonomous vehicle may work on highways but fail to make good decisions on a crowded city street during a snowstorm; a medical diagnosis system may work well with common diseases but frequently miss diagnosis for rare diseases.

To some extent, the search for "explainable AI" is a bandage on that problem: if you can't make your software perfect, you'd at least like to know why it makes the mistakes that it does in order to debug it. Unless and until we get smarter about building complex cognitive software that can cope with the complexity and variability of the real world, demanding explainability in order to facilitate debugging may be the best we can do.

The Economist: Jobapocalypse, yes or no—and if so, when? How ought the state and business respond?

Next decade, not so much; next century: huge impact.

The thing to remember about the next decade is that current AI can't even read; it also can't reason. Four years ago, machine-learning pioneer Geoffrey Hinton said that radiologists would lose their jobs in five or ten years; radiologists got really scared, a lot of people stopped studying radiology. So far? Not one radiologist has actually been replaced. Radiology isn't just about looking at images (which deep learning is good at), it's also about reading (patient histories) and reasoning (about how to put words and images together), and machines can't yet do that reliably. What we have now are our new tools that can help radiologists, not radiologists-in-a-box.

That said, more powerful forms of AI will eventually emerge, and that point I think it is inevitable that we will have to move towards something like a universal basic income, and a new way of life, in which most people find fulfilment through creative endeavours rather than employment.

The Economist: Is there anything that you think AI will never be able to do, that humans will always be able to do well—and thus give humans an edge in society?

I wouldn't count on it. Paraphrasing the late musician Prince, never is a mighty long time. Human brains are the most flexible machines for thinking that is currently found on the planet, but (as I explained in an earlier book, “Kluge”), a long way from perfection; no physical law prevents us from building machines with minds as powerful as the human mind, nor from building machines with minds more powerful and more reliable than anything biology has thus far developed.

On the other hand, I don't foresee machines trying to get an "edge" over us; they do what they do because they follow the programs that guide them. Unless something fundamental changes, we will still be in charge, even if our machines exceed us in pure cognitive capacity. Calculators are better in arithmetic than people are, but thus far have shown no interest whatsoever in taking over society.

The Economist: What would you tell a 15-year-old to study, to be relevant in the workplace of the future?

Learn how to learn. Creativity, on-the-fly learning, and critical thinking skills are going to be what matters. Your grandchildren may live in a world without work, but you won't; instead you will live in a world that is rapidly changing. Whatever you choose to do, learning to code will be a very valuable skill, even if you don't do it for a living, because knowing the basic logic of machines will be critical in thriving in our society as it adapts to the ever-growing powers of machines.

Reuse this content

Don’t trust AI until we build systems that earn trust

Progress in artificial intelligence belies a lack of transparency that is vital for its adoption, says Gary Marcus, coauthor of “Rebooting AI”

More from Open Future

“Making real the ideals of our country”

How society can overcome covid-19

Telemedicine is essential amid the covid-19 crisis and after it