Google: Our Assistant Will Trigger the Next Era of AI

The company’s scientists think its voice bot will be the biggest thing since search.
Image may contain Text and Alphabet

It is the day after Google’s big hardware event in San Francisco, when the company formally unveiled a new phone (a jab to the iPhone) and a voice-activated speaker (a gut punch to Amazon’s Echo). Word of mouth is already tracking positive; a countdown to ecstasy, in the form of upcoming rhapsodic reviews of the Pixel phone, has already begin. But in a conference room on the company’s sprawling Mountain View campus, Fernando Pereira, who leads Google’s projects in natural language understanding, is less excited about his company’s shiny new devices than he is about what will happen when people use them. “Let me tell you a little bit about The Transition,” he says.

Pereira holds the title of distinguished scientist at Google. Since arriving at the company in 2008 from his post as the chair of the Department of Computer and Information Science at the University of Pennsylvania, he has been at the center of Google’s efforts to answer the question: How do we learn the meaning of text from data? In other words, how can a machine truly understand the phrases that human beings peck and blab into its search fields and microphone? The researchers at Google and elsewhere have settled on an answer to that question: machine learning; specifically, a form of artificial intelligence called neural networks—self-organizing systems modeled on the way the brain works. These systems use sophisticated algorithms and tons of data to train themselves. The more data the better.

As Pereira explains it, The Transition is a Brink’s Job-level bounty of data that his team and other scientists at Google will receive when millions of people start conversing with his company’s flagship bot, the Google Assistant. The Assistant is a single software system that will be implemented across multiple Google platforms, including the Pixel phone and the Google Home device. It strives to control the functions on the phone like Siri does, perform services as seamlessly as Amazon’s Alexa, and conduct Geisha-level chatter that puts to shame the business bot in Facebook’s Messenger.

Though Google already interprets voice commands in products like voice search in the Google app, the Assistant is different: Google sees it as the apotheosis of its efforts to answer questions and perform functions. The company sees the Assistant as an evolution of many products, including Search, Maps, Photos, and Google Now. Sample queries the company offers display the product’s intended breadth: Show me pictures of the beach. Play dance music on the TV. Tell me about my day. The Assistant is optimized to do much of its work via a verbal, person-machine interchange. After it gives an answer to Where’s the closest Italian restaurant? you can tell it to Navigate there, and you’ll get directions.

As good as the Google Assistant purports to be, Pereira knows its shortcomings. Most frustrating, the Assistant’s ability to understand and converse about complex queries is only at the beginning of the long path that Google envisions. It is all too easy to run into the wall where the Assistant simply doesn’t get what you’re saying. Pereira needs the Assistant to really, really understand what people say, in a way that reflects a mastery of the intricacies of communication with an overall grasp of the way the physical world works.

This is hard, especially because Google hasn’t yet had the data to train its neural nets to the levels it aspires to reach. “When you try to build a system for understanding natural language, and you don’t have many examples of the kind of understanding you want,” Pereira says, “then you have to prescribe, you have to write—essentially teach it grammar—so that it can do the understanding. That teaching is very laborious.”

But Pereira sees this moment as a tipping point. As the beneficiary of over a decade of the company’s learnings, the Google Assistant is good enough to retain those users who try it out. The things those repeat customers will say to the Assistant, and their reactions to its actions, he believes, will help make Google Assistant great.

That process, unrolling over the next two years, is The Transition. When millions of people begin conversing with Google, through the Assistant, the seas of difficulty suddenly part. (With Google Home, conversing is the only way you will get any use out of it — there’s no keyboard.) “You can start doing machine learning on that,” Pereira says. “You can move much faster; you can accelerate the process of getting deeper and broader in understanding. This 2016-to-2017 Transition is going to move us from systems that are explicitly taught to ones that implicitly learn.” Think of it as a mini-Singularity.

The data flowing in during this two-year transition won’t stop, of course. (I should clarify here that Pereira and the other Googlers talking about this transition are referring to the collection of data in the aggregate, not in accumulating dossiers on the conversational preferences, peregrinations, and peccadilloes of individual users.) Pereira sees it leading to a better version of the Assistant, which in turn will lead to more usage, more conversation, more data — and more improvement. Perhaps a decade from now, this accelerating cycle may lead to a bot that really knows what we talk about when we talk about…anything.

“Launching the Assistant is very much like Google launching search back at the beginning of the company,” Pereira says. “Search was a great thing then, but compared to what it is today, there’s so much more understanding. We’re going to see that with the Assistant of 10 years from now compared with the Assistant of today. It’s going to be way more fluent, more able to help you do what you want to, understand more of the context of the conversation, be more able to bring information from different sources together.”

Google CEO Sundar Pichai talks about the Assistant at a hardware event.

Bloomberg / Getty Images

There is a precedent for this. In 2007 Google started a service called 1–800-GOOG-411. In the steampunk-y days of dial phones, spinning the digits 411 would connect you with a service called “Information” (the name seems weird now), where a human operator would listen to you recite the name and location of a person or business you wanted to call. Then he or she would give you the telephone number. At a certain point, phone companies began charging for the service.

But Google offered a free, automated alternative that took your voice request and instantly connected you to the business you requested. The point was not to win friends or even augment search. Google was out to collect a huge database of spoken words that it could digitize and analyze. As Marissa Mayer, then a Google VP, explained at the time, “The speech recognition experts that we have say, ‘If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation...1–800-GOOG-411 is about that: Getting a bunch of different speech samples so that when…we’re trying to get the voice out of video [or other tasks requiring voice recognition], we can do it with high accuracy.”

After three years, Google had a sufficient supply of phonemes that it could begin doing things like voice dictation. So it discontinued the service. The Transition will have a similar purpose: gathering many millions (if not billions) of requests to the Google Assistant in different scenarios — on the move with a phone; in the house with Google Home — so that the company can train its deep learning neural nets to profoundly understand how to make a bot that knows what you’re asking for, and that can converse with you until your request is satisfied.

Google needs this despite 18 years of collecting data from search fields. For one thing, people don’t interact with search in a conversational way. “People have such strong expectations in search,” says Scott Huffman, a Google VP. “Like, Oh, here’s this box.I’m supposed to put in 2.5 words and I’m going to get back public information. Tell people they can say to that box, Call my sweetheart, and they’re like, What? I would never say that to that box!” But they are more likely to do it for the Assistant, which, through machine learning, now understands how to figure out what a “sweetheart” is and how to identify that person by frequency of calls to him or her, and other data.

Other important information will come as people ask the Assistant to perform actions for them. “In the search logs, we don’t see people asking to do things like, Book me a table at CasCal for 7 pm for two. Nobody’s going to say that to Google because Google is a search engine, right?” Pereira says. Actually, booking a table is one thing that Google Search can do, but that’s a rare exception: generally, Google search can give you answers, but can’t close the deal. So people don’t ask it to do things, and Google doesn’t get data on assistance. “That difference between knowing and doing is a big one,” says Pereira, “and only now are we only starting to get enough traffic and interaction to start understanding how we can make [an assistant] grow and become more robust, more general, more flexible. It’s going to be a long road to go from the information side where search comes from, to the doing side — to pervasive assistance.”

Fernando Pereira

Talia Herman

The Google Assistant first appeared in September as a feature in the company’s “smart” messaging app called Allo; it’s only now hitting the big time in the Pixel phone. (Google Home ships November 4.) Critics are reporting that the Google Assistant understands requests and executes tasks better than Apple’s Siri. Its short-term memory allows it to retain information so that when you ask for nearby movies, you can book one with it by saying “Get me tickets for the 4 pm showing of The Accountant,” and it’ll know which theater you’re talking about. But it isn’t hard to make a reasonable request that exposes the shallowness of the Assistant’s understanding of the world.

Huffman, a long-time search executive who now oversees the Assistant’s development, gives an example. Right now, the Google Assistant will perform to expectations if you ask it to book a table at a Mexican restaurant near you. But if you ask it for a table at “one of my usual places,” you’re taking a Thelma and Louise drive into the Flummoxed Valley.

“Sorry,” it will say, “I can’t help with that.”

You will view it as a failure. But Google sees it as an opportunity. It’s part of The Transition. Those words of concession, in effect, are alerts to the computer scientists back in Mountain View. Every time the Google Assistant apologizes, it’s a data point that something might be improved, and if enough of them accumulate from similar queries, then the team is likely do something about it. In the case of a request for “the usual places,” those failures might lead the engineers to improve the Assistant so it understands that concept, a power not yet baked into the faux neurons of its net.

How would that be done? “We might look at a distribution of people’s places to visit, and come up with some set of filters or limits about how we should define what ‘usual’ means,” says Huffman. Next, the engineers might make up a rule to test against—for instance, that “usual” might mean a place within a 10-minute drive that you visited three times in the last six months. “It almost doesn’t matter what it is — just make up some rule,” says Huffman. “The machine learning starts after that. So that when you say, Book the usual place, whereas before we couldn’t help you, now we’d say, Oh, you want to go to Joe’s Diner on Third Street? If they go, No, I hate Joe’s Diner, I haven’t been there in six months — I want to go to Suzy’s Diner, that’s great! Then you can start to develop a model that’s completely machine learning.”

Pereira says once that happens, Google’s neural nets will be able to take leaps in understanding so that they can mine the company’s existing Knowledge Base (a sprawling information storehouse with over 70 billion facts) — as well as unstructured data on the web—in a more holistic manner, interpreting the actual meaning of all those places and things. That way, it could apply information from a wide range of sources to get a single complicated task completed.

One example used internally at Google is the process of replacing a busted water heater. Getting that done requires figuring out the proper water heater for the home, pricing it, getting consumer data on the best brands, making the purchase, contacting an installer, and making a mutually agreeable appointment for the installation. Right now that task requires combining disparate shards of information using common sense, knowledge, and maybe a bunch of online catalogs. Google dreams that one day, right after you finish mopping up the damage from your blown water heater, you will say “OK Google” to your home device, carry on a brief but purposeful conversation with that stubby little speaker, and then sit back and wait for the installer to arrive. “All the information needed to do this is somewhere on some computer file,” says Pereira. “The information about all the hot water heaters, the BTUs, the installer’s schedule, everything. But we cannot do this at all. That’s the holy grail. That’s where we want go. But certainly not where we are.”

None of this will happen if people don’t find the Google Assistant captivating enough to keep talking to it, providing Google with the data that will fuel its improvement. If people use the Assistant only in limited ways, and don’t keep pushing its boundaries, The Transition will stumble. “Honestly, the challenge for us is going to be to have enough of the conversational capability — which we think we do — to convince people to keep doing it,” says Huffman. “We’re obviously not perfect, but if we can have enough to keep people trying, that will give us the grist for the mill to really do it well.”

The Transition is waiting to happen. But first, people have to start saying, “OK Google.” And keep talking.

Creative Art Direction by Redindhi Studio
Portraits of Fernando Pereira by Talia Herman