The gamer punches in play after endless play of the Atari classic Space Invaders. Through an interminable chain of failures, the gamer adapts the gameplay strategy to reach for the highest score. But this is no human with a joystick in a 1970s basement. Artificial intelligence is learning to play Atari games. The Atari addict is a deep-learning algorithm called DQN.

This algorithm began with no previous information about Space Invaders—or, for that matter, the other 48 Atari 2600 games it is learning to play and sometimes master after two straight weeks of gameplay. In fact, it wasn't even designed to take on old video games; it is a general-purpose, self-teaching computer program. Yet after watching the Atari screen and fiddling with the controls over two weeks, DQN is playing at a level that would humiliate even a professional flesh-and-blood gamer.

this image is not available
Media Platforms Design Team

Volodymyr Mnih and his team of computer scientists at Google, who have just unveiled DQN in the journal Nature, says their creation is more than just an impressive gamer. Mnih says the general-purpose DQN learning algorithm could be the first rung on a ladder to artificial intelligence.

"This is the first time that anyone has built a single general learning system that can learn directly from experience to master a wide range of challenging tasks," says Demis Hassabis, a member of Google's team.

Inside DQN's Brain

this image is not available
Media Platforms Design Team
This image highlights the neuroscience behind the Deep Q-Network (DQN) agent, which is rendered in a video-hologram style and exerts mastery over its data-rich external environment.

The algorithm runs on little more than a powerful desktop PC with a souped up graphics card. At its core, DQN combines two separate advances in machine learning in a fascinating way.

The first advance is a type of positive-reinforcement learning method called Q-learning. This is where DQN, or Deep Q-Network, gets its middle initial. Q-learning means that DQN is constantly trying to make joystick and button-pressing decisions that will get it closer to a property that computer scientists call "Q." In simple terms, Q is what the algorithm approximates to be biggest possible future reward for each decision. For Atari games, that reward is the game score. 

At no point does DQN actually "understand" what's going on in the game in the way a human does

Knowing what decisions will lead it to the high scorer's list, though, is no simple task. Keep in mind that DQN starts with zero information about each game it plays. To understand how to maximize your score in a game like Space Invaders, you have to recognize a thousand different facts: how the pixilated aliens move, the fact that shooting them gets you points, when to shoot, what shooting does, the fact that you control the tank, and many more assumptions, most of which a human player understands intuitively. And then, if the algorithm changes to a racing game, a side-scroller, or Pac-Man, it must learn an entirely new set of facts. 

That's where the second machine learning advance comes in. DQN is also built upon a vast and partially human brain-inspired artificial neural network. Simply put, the neural network is a complex program built to process and sort information from noise. It tells DQN what is and isn't important on the screen.

Together, the artificial neural network and the Q-learning system allow DQN to soak up information in chunks. DQN looks at the last three frames of the Atari game its playing (and the current one) and over time, uses its past experience to predict which move will best impact its future score. It learns through trial and error—at no point does DQN actually "understand" what's going on in the game in the way a human player does. However, it gets better and better at relating the images it receives from the game screen to an optimal decision. 

Like so many of us, DQN performs better at some games than others. For example, the algorithm is a frighteningly good at Breakout, Boxing, and Star Gunner—orders of magnitude better than any human. But the algorithm fumbles at the side-scroller Montezuma's revenge. (The researchers believe this is probably related to Montezuma's revenge's point system, which does not consistently reward moving through the game with more points.)

Is DQN Artificial Intelligence?

youtubeView full post on Youtube

"This system that we've developed is just a demonstration of the power of the general algorithms," says Koray Kavukcuoglu, one of DQN's developers. "The idea is for future versions of the system to be able to generalize to any sequential decision-making problem," such as the (admittedly distant) task of sorting through scientific data and drawing scientific conclusions. For now, they says DQN should be "applicable to many other tasks," he says, including more complex videogames.  

Bernhard Schölkopf, the director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, who was not involved in DQN's development, applauds the Google group's work. In a written analysis of DQN—also published today in Nature—he refers to it as "a remarkable example of the progress being made in AI," and makes a comparison to Deep Blue, the famous computer that beat chess champion Garry Kasparov in 1997.

This algorithm began with no previous information about Space Invaders

The Google researchers are cafeful to call DQN an "artificial agent," rather than AI. But the algorithm nonetheless starts to blur the lines. How do you classify a program that teaches itself to excel at tasks it's not designed for?

AI researchers like Douglas Hofstadter have previously told Popular Mechanics that the term "artificial intelligence" deserves a hard stance. Technologies like IBM Watson or Apple's Siri are not AI, he argued, because they do not comprehend or think. To Hofstadter, comprehension is the core of the "intelligence" in AI.

But Googles Hassabis says that algorithms like DQN are "more human-like [than Watson or Siri] in the sense that they learn how humans learn… from experiencing the world around us—and then our brains make models of the world and make decisions about what to do."

Headshot of William Herkewitz
William Herkewitz
Science & Technology Reporter
William Herkewitz is a science and technology journalist based in Berlin, Germany. He writes about theoretical physics, AI, astronomy, board games, brewing and everything in between.