The era of fake writing is upon us

A new reason to be skeptical of posts you read online.

By Joss Fong Mar 4, 2020, 10:00am EST

Joss Fong is a founding member of the Vox video team and a producer focused on science and tech. She holds a master's degree in science, health, and environmental reporting from NYU.

Until 2019, it has been the case that if you come across several paragraphs of text on a consistent topic with consistent subjects, you can assume that text was written or structured by a human being. That is no longer true.

Over the past year, AI researchers designed computer programs with the ability to generate multi-paragraph stories that remain fairly coherent throughout. As we explain in the video above, these programs create passages of text that seem like they were written by someone who is fluent in the language but possibly faking their knowledge. I’m not sure we needed an automated version of that person, but here it is:

A fake news article — This excerpt was entirely machine generated by a model trained on news articles from 5000 publications.

Depending on how you look at it, this technology is a powerful bullshit machine or a promising tool for artists. So far, the creative uses seem to outnumber the malicious ones, but it’s not difficult to imagine how text-fakes could cause harm, especially since these models have been widely shared and are now deployable by anyone with basic know-how.

The field of Natural Language Processing (NLP) didn’t set out to create a fake news machine. Rather, this is the byproduct of a line of research into massive language models — machine learning programs that build vast statistical maps of the correlations between words. They look at a sample of text and guess the next word based on how frequently that word appeared in similar contexts in the training data.

That sounds simple but it’s an incredibly challenging task. They need to account for the fact that different words can have different meanings depending on the context. They need to be able to sort out which pronouns refer to which nouns. And they need to keep track of long-range dependencies, which are words whose meanings hinge on other words that are relatively far away. Since most computer models in the past were focused on the immediate context, they couldn’t continue a consistent idea or story.

That has changed for two reasons. First, these models are now “pretrained” on way more data than before, millions of articles pulled from the internet. And second, the computers are able to handle that amount of data because researchers adopted a new technique called “Transformer,” which allows for a more efficient use of computing power. The result is that the models can access more contextual information about each word and therefore make more plausible sentence predictions.

Over the past few years, pretrained language models have enabled huge strides across a number of language tasks. Text generation is key part of language translation, chatbots, question-answering, and summarization. The problem is that in their simplest form, when they’re prompted to do open-ended generation, language models are indifferent to the truth. That’s what makes them creative, but it also puts them on the wrong side of the battle against trolls, propagandists, and con artists online.

Bots roam the internet in huge numbers, primarily deceiving other computers. Now, with a decent handle on our language, they have new ways of deceiving humans directly. Certainly it’s been possible to simply hire people to write posts, fake reviews, and misinformation. What this tool adds is scale, language fluency, and the ability to mirror the jargon and writing style of any profession or, with enough samples, any individual.

The recent advances in language modeling mean that voice assistants will get better, chatbots will get better, and businesses will have better ways of analyzing documents. But for the places where humans gather online to talk to other humans, the internet will get a little worse.

You can find this video and all of Vox’s videos on YouTube. And join the Open Sourced Reporting Network to help us report on the real consequences of data, privacy, algorithms, and AI.

Open Sourced is made possible by the Omidyar Network. All Open Sourced content is editorially independent and produced by our journalists.

Will you support Vox today?

We believe that everyone deserves to understand the world that they live in. That kind of knowledge helps create better citizens, neighbors, friends, parents, and stewards of this planet. Producing deeply researched, explanatory journalism takes resources. You can support this mission by making a financial gift to Vox today. Will you join us?

One-Time Monthly Annual

$5/month

$10/month

$25/month

$50/month

Other

Yes, I'll give $5/month

We accept credit card, Apple Pay, and Google Pay. You can also contribute via

The era of fake writing is upon us

The Latest

Israel and Iran’s conflict enters a new, dangerous phase

Trump’s jury doesn’t have to like him to be fair to him

What’s behind the latest right-wing revolt against Mike Johnson

Taylor Swift seems sick of being everyone’s best friend

Are there really more things going wrong on airplanes?

It’s impossible to be neutral about Taylor Swift

Sign up for the newsletter Today, Explained

Thanks for signing up!

The era of fake writing is upon us

Share this story

Share All sharing options for: The era of fake writing is upon us

Next Up In Technology

Sign up for the newsletter Today, Explained

Thanks for signing up!

The Latest

Israel and Iran’s conflict enters a new, dangerous phase

Trump’s jury doesn’t have to like him to be fair to him

What’s behind the latest right-wing revolt against Mike Johnson

Taylor Swift seems sick of being everyone’s best friend

Are there really more things going wrong on airplanes?

It’s impossible to be neutral about Taylor Swift

Sign up for the newsletter Today, Explained

Thanks for signing up!

All sharing options for: The era of fake writing is upon us