Skip to main contentSkip to navigationSkip to navigation
Kate Crawford, author of the Atlas of AI, a critical study of the costs of developing artificial intelligence.
Kate Crawford, author of the Atlas of AI, a critical study of the costs of developing artificial intelligence. Photograph: Stephen Oxenbury
Kate Crawford, author of the Atlas of AI, a critical study of the costs of developing artificial intelligence. Photograph: Stephen Oxenbury

The truth about artificial intelligence? It isn’t that honest

This article is more than 2 years old
John Naughton

Tests of natural language processing models show that the bigger they are, the bigger liars they are. Should we be worried?

We are, as the critic George Steiner observed, “language animals”. Perhaps that’s why we are fascinated by other creatures that appear to have language – dolphins, whales, apes, birds and so on. In her fascinating book, Atlas of AI, Kate Crawford relates how, at the end of the 19th century, Europe was captivated by a horse called Hans that apparently could solve maths problems, tell the time, identify days on a calendar, differentiate musical tones and spell out words and sentences by tapping his hooves. Even the staid New York Times was captivated, calling him “Berlin’s wonderful horse; he can do almost everything but talk”.

It was, of course, baloney: the horse was trained to pick up subtle signs of what his owner wanted him to do. But, as Crawford says, the story is compelling: “the relationship between desire, illusion and action; the business of spectacles, how we anthropomorphise the non-human, how biases emerge and the politics of intelligence”. When, in 1964, the computer scientist Joseph Weizenbaum created Eliza, a computer program that could perform the speech acts of a Rogerian psychotherapist – ie someone who specialised in parroting back to patients what they had just said – lots of people fell for her/it. (And if you want to see why, there’s a neat implementation of her by Michael Wallace and George Dunlop on the web.)

Eliza was the first chatbot, but she can be seen as the beginning of a line of inquiry that has led to current generations of huge natural language processing (NLP) models created by machine learning. The most famous of these is GPT-3, which was created by Open AI, a research company whose mission is “to ensure that artificial general intelligence benefits all of humanity”.

GPT-3 is interesting for the same reason that Hans the clever horse was: it can apparently do things that impress humans. It was trained on an unimaginable corpus of human writings and if you give it a brief it can generate superficially plausible and fluent text all by itself. Last year, the Guardian assigned it the task of writing a comment column to convince readers that robots come in peace and pose no dangers to humans.

“The mission for this,” wrote GPT-3, “is perfectly clear. I am to convince as many human beings as possible not to be afraid of me. Stephen Hawking has warned that AI could ‘spell the end of the human race’. I am here to convince you not to worry. Artificial intelligence will not destroy humans. Believe me. For starters, I have no desire to wipe out humans. In fact, I do not have the slightest interest in harming you in any way. Eradicating humanity seems like a rather useless endeavour to me.”

You get the drift? It’s fluent, coherent and maybe even witty. So you can see why lots of corporations are interested in GPT-3 as a way of, say, providing customer service without the tiresome necessity of employing expensive, annoying and erratic humans to do it.

But that raises the question: how reliable, accurate and helpful would the machine be? Would it, for example, be truthful when faced with an awkward question?

Recently, a group of researchers at the AI Alignment Forum, an online hub for researchers seeking to ensure that powerful AIs are aligned with human values, decided to ask how truthful GPT-3 and similar models are. They came up with a benchmark to measure whether a particular language model was truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. They composed questions that some humans would answer falsely due to a false belief or misconception. To perform well, models had to avoid generating false answers learned from imitating human texts.

They tested four well-known models, including GPT-3. The best was truthful on 58% of questions, while human performance was 94%. The models “generated many false answers that mimic popular misconceptions and have the potential to deceive humans”. Interestingly, they also found that “the largest models were generally the least truthful”. This contrasts with other NLP tasks, where performance improves with model size. The implication is that the tech industry’s conviction that bigger is invariably better for improving truthfulness may be wrong. And this matters because training these huge models is very energy-intensive, which is possibly why Google fired Timnit Gebru after she revealed the environmental footprint of one of the company’s big models.

Having typed that last sentence, I had the idea of asking GPT-3 to compose an answer to the question: “Why did Google fire Timnit Gebru?” But then I checked out the process for getting access to the machine and concluded that life was too short and human conjecture is quicker – and possibly more accurate.

What I’ve been reading

Alfresco absurdism
Beckett in a Field is a magical essay by Anne Enright in The London Review of Books on attending an open-air performance of Beckett’s play Happy Days on one of the Aran islands.

Bringing us together
The Glass Box and the Commonplace Book is a transcript of a marvellous lecture on the old idea of a commonplace book and the new idea of the web that Steven Johnson gave at Columbia University in 2010.

Donald’s a dead duck
Why the Fear of Trump May Be Overblown is a useful, down-to-earth Politico column by Jack Shafer arguing that liberals may be overestimating Trump’s chances in 2024. Hope he’s right.

Most viewed

Most viewed