Facebook's Augmented Reality Engine Brings AI Right to Your Phone

Facebook can't realize its AR ambitions if AI stays tied to data centers. So it's bringing neural nets right to your phone.
F8ConferenceRTS12TF8.jpg
Stephen Lam/Reuters

When Hussein Mehanna showed off a new incarnation of Facebook's Big Blue App back in November, it seemed a tiny improvement—at least on the surface. The app could transform a photo from your cousin's wedding into a Picasso or a Van Gogh or a Warhol, a bit of extra fun for your social media day. But Mehanna and his team of Facebook engineers were laying the groundwork for an audacious effort to change the future of computing—what Facebook CEO Mark Zuckerberg calls a platform for augmented reality.

Zuckerberg formally unveiled this platform on Tuesday morning during his keynote at F8, Facebook's annual developer conference. In short, Facebook is transforming the camera on your smartphone into an engine for what is commonly called AR. The company will soon allow outside companies and other developers to build digital effects that you can layer atop what you see through your camera. "This will allow us to create all kinds of things that were only available in the digital world," Zuckerberg said on stage at the civic center in downtown San Jose, California. "We're going to interact with them and explore them together."

Initially, Facebook will offer ways of applying these effects to still images, videos, or even live videos shot with your phone. On stage, Zuckerberg showed how you could add a digital coffee cup to a photo of your kitchen table—or even add a school of digital sharks that swim endlessly around your bowl of cereal. But the company is also working on ways of "pinning" digital objects to specific locations in the real world. You could "attach" a digital note to your refrigerator, and if your spouse views the fridge through her camera, she could see it too, as if the note was really there. In other words, Zuckerberg views his platform as a way of expanding a game like Pokémon Go into a fundamental means of interacting with the world around us.

That's a bold play, to say the least. And frankly, it's a very difficult thing to pull off—just in a technical sense, let alone all the logistical questions that surround AR. Facebook will grapple with many of these questions in the months and years to come, most notably among them: Do people really want to view the world through their phones? But the company is already making serious progress on the technical side, as Mehanna's artist-filter demo made clear back in November.

Making AI Local

In applying Picasso's style to personal snapshots, that new Facebook app leans on deep neural networks, a form of artificial intelligence that's rapidly reinventing the tech world. But these neural networks are different. They run on the phone itself, not in a data center on the other side of the internet. This is essential to the kind of augmented reality Zuckerberg so gleefully pitched on Tuesday morning. You can't do what he wants to do unless these AI techniques run right there on the phone. Going over the internet takes much too long. The effect is lost.

"You can think of those early demonstrations as somewhat frivolous," says Yann LeCun, Facebook's director of AI research and one of the founding fathers of the deep learning movement. "But the underlying techniques can be used for so much more."

In order to layer a digital effect atop your smiling face, for instance, Facebook must identify exactly where your smiling face is within a camera's field of vision, and that requires a neural network. As LeCun explains, the company is also using neural networks to track people's movements, so that effects can move in tandem with the real world. And according to Facebook chief technology officer Mike Schroepfer, the company is exploring ways of adding effects based not only on what people are doing but what they're saying. That too requires a neural network. "We're trying to build a pipeline of the core technologies that will enable all of these common AR effects," he says.

Some of the effects that Zuckerberg described—most notably the technology that will let you pin stuff in the real world—are still months down the road, if not more. "There's a lot more that you have to get right to do that work," Schroepfer says. To attach a digital artifact to a physical location, the Facebook app must build what is really a detailed map of that location and then offer a way of sharing that map with others.

"If I want to leave a note on the table at the bar," he says, "I am both recording the precise location with GPS and recording the geometry of that scene in such a way that someone else, with a phone that was never there before, shows up and see the world and boot up this digital representation of it."

What's more, as these effects get more and more complex, they will run up against the very real hardware limits of our phones. Smartphones offer far less processing power than computer servers packed into data centers, and though Facebook has significantly slimmed down its deep learning tech for mobile devices, more complex models will require more juice. But here too, the groundwork is already being laid.

Intel, Qualcomm, and other chip makers are working to build mobile processors better suited to these kinds of machine learning techniques. According to Schroepfer, these types of hardware enhancements could provide a two to three-fold boost to the company's machine learning models. "We've seen things go from 10 frames per second to thirty frames per second," he says. "That's the difference between it's-not-really-usable and it's-kinda-fun."

Zuckerberg's grand vision for camera AR is still under development. But the path is in place—at least technically.