A Window Into How YouTube Trains AI To Moderate Videos

A Mechanical Turk task shared with WIRED provides a glimpse into how algorithms are trained to spot and sort content on the video platform.
Image may contain Word Furniture Indoors Room Text and Sphere
HOTLITTLEPOTATO

It’s no secret that YouTube has struggled to moderate the videos on its platform over the past year. The company has repeatedly faced scandals over its inability to rid itself of inappropriate and disturbing content, including some videos aimed at children. Often missing from the discussion over YouTube’s shortcomings, though, are the employees directly tasked with removing things like porn and graphic violence, as well as the contractors that help train AI to learn to detect unwelcome uploads. But a Mechanical Turk task shared with WIRED appears to provide a glimpse into what training one of YouTube's machine learning tools looks like at the ground level.

MTurk is an Amazon-owned marketplace where corporations and academic researchers pay individual contractors to perform micro-sized services—called Human Intelligence Tasks—in exchange for a small sum, usually less than a dollar. MTurk workers help keep the internet running by completing jobs like identifying objects in a photo, transcribing an audio recording, or helping to train an algorithm.

And while MTurk workers don't make content moderation decisions directly, they do routinely help train YouTube’s machine learning tools in all sorts of ways. The machine learning tools that they help train also do more than just find inappropriate videos, they aid other parts of YouTube’s system, like its recommendation algorithm.

“YouTube and Google have been posting tasks on Mechanical Turk for years,” says Rochelle LaPlante, the Mechanical Turk worker who shared the specific assignment with WIRED. “It’s been all different kinds of stuff—tagging content types, looking for adult content, flagging content that is conspiracy theory-type stuff, marking if titles are appropriate, marking if titles match the video, identifying if a video is from a VEVO account.”

LaPlante says that the tasks and guidelines often change. Some appear to be directly related to detecting offensive content, while others appear to be about helping determine whether a video is appropriate for a specific audience segment, like children. “Some workers have suspected this is related to decision making in which channels should be monetized or demonetized,” she says.

Watch and Learn

The specific moderation task shared with WIRED, which LaPlante completed on March 14 for a payout of 10 cents, is fairly straightforward, though it leaves plenty of room for the worker's opinions. The job offers a window into a usually opaque process: how a human's interpretation of a video is used to later help craft a machine learning algorithm. And even inside YouTube, machine learning algorithms only flag videos; determining whether something violates the company's Community Guidelines remains a human's job.

The MTurk HIT asks the the worker to watch a video, and then tick a series of boxes about what it contains. It also asks them to pay attention to the video's title and description. The MTurk worker should “watch enough of the video” to be confident in their judgment, and the HIT suggests they should consider watching it at 1.5x speed to quicken the process. The questions address whether the clip contains “crude/coarse language,” or “adult dialog,” including “offensive or controversial views.” It asks MTurk workers to differentiate between artistic nudity and content designed to “arouse or sexually gratify.”

One especially ambiguous section asks the worker to differentiate between “graphic depictions (actual or fictional) of drug use” and “incidental or comedic use of soft drugs." The task doesn't include a list of what counts as a hard or soft drug, though it does indicate that “hard drugs” include heroin. At the end of the task, the worker judges whether they think the video is appropriate for children.

The MTurk task that LaPlante completed for YouTube.

In order to make the federal minimum wage of $7.25, an MTurk worker would need to complete 72.5 tasks like this in an hour, meaning there's an incentive to answer these questions extremely quickly. While some of the questions YouTube asks are straightforward (Is there any speech or singing in the audio?), most are nuanced, and underscore the complexity of training an artificial intelligence to help sort a gigantic, global video platform. The average cat video likely wouldn’t trip up a worker assigned to this task, but it’s not hard to imagine how, say, a political rant about abortion might.

It’s not clear what purpose LaPlante’s specific task serves. It may used for content moderation specifically, or some other function, and YouTube declined to comment on the record whether it had created this specific task. The video link included in the assignment now leads to a page that says it’s “unavailable.” The video was captured by the Internet Archive’s Wayback Machine 56 times between September 2016 and March 2018, but even the earliest screenshots say the video “doesn’t exist.” LaPlante also doesn’t recall the exact clip. “I don’t remember any one video in particular but it seemed to be a bit of everything—uploads from individual people, clips from TV or movies, advertising, video gaming. It wasn’t one particular genre or type of video,” she says.

Human Helpers

In December, YouTube pledged to increase its moderation workforce to 10,000 people in 2018. MTurk workers don’t count as part of that number, because they’re not moderating content work outright, but instead helping to train AI to aid in that process in the future.

“Even if they are only using MTurk to train machine learning algorithms, I would expect that some of this training would be training their algorithms to be able to do content moderation with less human involvement,” says LaPlante. “So while we may not be doing live content moderation on MTurk, we could still be contributing to content moderation in that we could be training the automated content moderation systems.”

Sarah T. Roberts, who researches content moderation at UCLA’s Graduate School of Education and Information and Studies, says it has become more common for platforms like YouTube to use micro-labor sites such as Mechanical Turk to complete “secondary or tertiary activities” like training algorithms. “That has become more of a question, and people like [LaPlante] and others who have long-term experience with working on micro-labor websites have a pretty sophisticated eye to spot that sort of thing.”

YouTube desperately needs the artificial intelligence tools that LaPlante and other MTurk workers train. The platform has failed repeatedly over the last several months to police itself. Since the new year alone, it has had to confront one of its biggest stars for uploading a video featuring a suicide victim’s body, faced criticism for allowing a conspiracy theory about a Parkland shooting victim to trend on the platform, and failed to ban a white supremacist group believed to be connected to five murders until coming under public pressure.

For the most part though, conversations around how the platform should reform haven't involved the actual systems and individuals tapped to help YouTube improve. Part of that equation includes MTurk workers, who help to train YouTube’s newest machine learning tools, which will likely one day assist moderators in detecting inappropriate content more quickly and accurately.

Algorithms already detect 98 percent of violent extremist videos on YouTube, according to the company, though a human moderator still reviews these videos. In the future, they'll likely take on an even greater share of content moderation work. For now though, most AI isn’t smart enough to make nuanced decisions about what kind of content should stay and what should go.

At the ground level, it's not hard to see why. The foundation of YouTube's fancy artificial intelligence technology often boils down in part to an MTurk worker making snap decisions for pennies. Attempting to replicate human judgment is no easy task, and an MTurk worker's responses to YouTube's questions can't help but be subjective. Even built with the best intentions, algorithms will never be neutral or completely impartial, because they're built by humans. Sometimes, they're even the result of underpaid people watching YouTube videos at 1.5 times normal speed.

YouTube Blues