Facebook's Human-Powered Assistant May Just Supercharge AI

Facebook's new virtual assistant, M, is driven mostly by humans. But it may still be a step forward for AI.
GettyImages565258861   Rights Managed
Getty Images

Face it: Siri sucks. So often, she has no clue what you're saying. And when she does, there's a pretty good chance she'll respond with nothing more than a page filled with Internet links.

Part of the problem is that Apple's talking digital assistant is built on old technology. But even if the company upgrades Siri to the latest in artificial intelligence, she'll fall well short of an assistant made of flesh, blood, and neurons. As far as artificial intelligence has come in the last few years, it's still a long way from intelligence.

With M, its new virtual assistant, Facebook admits as much.

Built atop Facebook Messenger—the company's instant messaging app—M made its debut this morning, arriving on the phones of a few hundred unsuspecting souls in the San Francisco Bay Area. Yes, it's the company's answer to Siri and similar services like Google Now and Microsoft Cortana. But it tackles a broader range of tasks, at least as Facebook describes it. You can ask M questions along the lines of Can you make me dinner reservations? or even Can you help me plan my next vacation?—and it will comply.

That's because Facebook designed the tool so that AI technology responds to these questions in tandem with humans. "The AI tries to do everything," says Alex Lebrun, the founder of Wit.ai, a startup Facebook acquired to help build this smartphone tool. "But the AI is supervised by the people."

In the larger world of AI-driven personal assistants, M may seem like a regression. And as Facebook tests the tool with the public, it's unclear whether this human-machine partnership can keep pace as the project expands to an ever-larger audience. But in a counterintuitive way, M may actually be a step forward for AI.

The idea is that humans will not only answer queries the AI is incapable of answering but, in the long run, help to improve this AI. Today's artificial intelligence, you see, requires at least some human training. If you want a system to automatically identify cats in YouTube videos, humans must first show it what a cat looks like. They must tag all sorts of feline photos. They must provide data. Through the human staff backing M, Facebook is doing this type of thing in unusually complex ways. "This is why we have this big team of people," Lebrun says. "The data we need is nonexistent."

In answering your questions, these humans will provide the data needed to bootstrap a much more sophisticated digital assistant based on a separate form of artificial intelligence called "deep learning." This could take many years. But such is the way with AI. "Human-level AI is a good philosophical discussion to have," says Dennis Mortensen, the CEO and founder of x.ai, a startup offering an online personal assistant that automatically schedules meetings, "but it's not going to happen anytime soon."

New Old Technology

Part of the irony is that Wit.ai offers pretty old AI. Its technology is based on two algorithms—"conditional random fields" and "maximum entropy classifiers"—that have served the tech world for more than a decade. But it provides a stepping stone as the M project seeks a loftier breed of artificial intelligence.

Lebrun founded Wit.ai in 2013, after creating a digital agent that companies like AT&T used to communicate with their customers. Basically, Wit.ai offered a service that could help software coders build Siri-like systems that could recognize speech and, to a certain extent, understand natural language. Yes, it was based on older algorithms, but it could learn to recognize speech without tapping the enormous collections of voice data available only to the likes of Apple or Google. It required less data, and it worked by pooling voice samples collected by the many developers who used it.

As David Marcus, the Facebook vice president who oversees Messenger, sought ways of expanding Messenger into areas that might generate revenue, he approached Lebrun and company, and in January, Facebook acquired the 10-person startup for an undisclosed sum. Marcus says Facebook nabbed "one of the best teams in the world at human-to-AI interactions." But according to Lebrun, the team wasn't quite sure what human-to-AI stuff they would be building.

About three months later, Marcus, Lebrun, and the rest of this small group settled on the idea of a virtual assistant that would run atop Messenger. But it wouldn't be another Siri. For one, it would communicate primarily by text, not voice. And it would answer a more complex range of questions. "You have lots of AIs—like Siri, Google Now, or Cortana—whose scope is quite limited. Because AI is limited, you have to define a limited scope," Lebrun says. "We wanted to start with something more ambitious, to really give people what they're asking for." This meant the team would need more than AI.

Human Heavy-Lifting

When you ask M a question, the AI works to understand what you're asking and formulates a response. But rather than sending it to you, the system sends this response to human "trainers"—customer-service types who work alongside the Wit.ai team inside Facebook's new building in Menlo Park, California. These trainers then decide what else must be done to provide what you're looking for (see image below).

According to Lebrun, the AI can do most of the work for simpler tasks, like telling a joke. It'll query an Internet joke API—a service that supplies jokes—and a trainer will approve the joke if it's funny. For more complicated tasks, such as making a driving test appointment at the DMV, the humans will do most of the heavy lifting. They'll actually place a call to the DMV.

In doing that heavy-lifting, the humans generate a roadmap for how particular questions should be answered. "Everything the trainers do, we record every step," Lebrun says. This includes what websites they visit, what they say when calling the DMV, what they type in response to M users, and so on. In the future, this data can help drive a more advanced system based on deep learning, a form of AI that masters tasks by analyzing enormous quantities of information across a vast network of machines. Roughly speaking, these networks mimic the web of neurons in the human brain.

Facebook

Such neural nets have already proven enormously effective in identifying images, recognizing speech, targeting ads, even teaching robots to screw on bottle caps. And after hiring an NYU computer science researcher named Yann LeCun, Facebook is a leader in this increasingly important field. The company now uses neural nets to recognize faces in photos posted to its social network and identify what you're likely to want in your News Feed. With M, it aims to push the technology further still.

'The More You Know, The More You Don't Know'

Why not just build M with neural nets from the beginning? Without the right data, neural nets couldn't provide a service much more powerful than, well, Siri, and Wit.ai's tech can get things started with relatively little data. "This is a good way to bootstrap. With a few thousand data-points, you can start to build a model," Lebrun says. "Then, using this model, you get more data, and once you have about a million data points, you go to Yann and get some deep learning."

As Lebrun describes it, this is a remarkably ambitious play. Even after bringing neural nets into the mix, he says, the company will continue to use human trainers for years on end. As M improves, it will need still more data to continue improving. "The more you know, the more you don't know," Lebrun says. "The more the AI does, the more complex tasks it will be required to do."

At least, that's the plan. M only launched today, and we don't know how well all this will work. As the company expands M to more and more users, it will need more and more trainers. Lebrun expects the number of trainers to grow linearly as the number of users grows exponentially, but the burden could be enormous. Facebook Messenger is used by more than 700 million people. "This is not easy," Lebrun admits.

Meanwhile, even if Facebook can keep the system going, the AI may not evolve as quickly as Lebrun and company expect it to. Dennis Mortensen, of the digital assistant startup x.ai, says that having humans work alongside the AI (as opposed to just training it) may actually slow things down. He offers self-driving cars as an example. One way to build such vehicles is to slowly add automation as humans continue driving. But the system may evolve faster if you let the car drive itself—however ill-equipped it might be. Those small automated tools may not be essential to the end result. They may collect data you don't need.

But at the same time, like Lebrun and Facebook, Mortensen underlines the importance of data to the evolution of AI, and he says that if Facebook plays things right—properly guiding and recording and cataloging the behavior of its trainers—these humans could indeed provide a shortcut. Facebook must focus on how the humans can improve the system in the future, he says, not just in the present. That isn't easy. But it's the sort of thing Facebook does better than most.

Additional reporting by Jessi Hempel.