Apr 2, 2018 7:00 AM

How Grubhub Analyzed 4,000 Dishes to Predict Your Next Order

To create a recommendation engine, the online food-delivery service spent eight years resolving a classic problem of unstructured data.

All Matt Maloney wanted to know was whether Chicago-style deep dish pizza is better than New York-style thin crust. It’s a simple question.

If he were anyone else, Maloney would have had to get violently anecdotal. Deep dish, while delicious, is obviously not so much a pizza as a casserole; conversely, if you want to put pizza toppings on a cracker, why not just order a flatbread? (Maloney is from Chicago, so you can guess which side he comes down on.)

But no. Maloney felt like he should be able to literally answer the question. Because in addition to being deeply dishian, he’s also the CEO of Grubhub, the biggest online food-delivery service in the US. “Given the volume of transactions I do on a daily basis,” Maloney says, “I should be able to tell you, objectively, which is better.”

Don’t let’s fight about whether “popular” equals “better.” Because broadly, Maloney is of course right. With 14.5 million active users ordering from 80,000 restaurants, Grubhub data ought to be able to tell you a lot about food. Maloney wanted to be able to segment, quantify, and compare who was ordering what across neighborhoods and cities. He wanted to algorithmically recommend dishes, help restaurants optimize their food choices, attract new customers with slicker service, and frankly get customers all over the country to act more like New Yorkers, who order from somewhere at least once a week.

Today Grubhub does indeed have an algorithm that can look across a country’s worth of take-out orders and tell a user what Indian joint near them delivers the most popular chicken tikka masala. But getting there required solving a seemingly impossible data problem, some high-end machine learning, and a cookbook author from Brooklyn.

Comparing Pad Thai

The problem was the data. Not the orders—the who-orders-what and from-where. Those are easy. It was the menus. Nobody’s dishes matched, each one was unique. A pilaf from one restaurant might be a biryani at another. Japanese curries weren’t Indian curries weren’t Pakistani curries. They worked on it for eight years. “Every time, the product and tech groups came back and said, ‘Matt, this is way too hard. Ultimately, to get what you want, it’s going to be a manual solution and we have 10 other things that are a priority,’” Maloney says.

His response: “Guys. We’re a multibillion-dollar company and we can’t tell people what the intrinsic value of these fucking dishes are? We can’t even compare pad thai across the country?”

“So I made them do it,” Maloney says.

Grubhub is only a multibillion-dollar company in the volume of food it moves, not in its revenues, but even so: What Maloney wanted is a tricky problem. That’s because of the unstructured, sui generis nature of restaurant menus. If you don’t have a methodology designed to produce data ready-made for statistical analysis, you’re using “found” data, which is always messy, says Duncan Watts, a social scientist at Microsoft Research. “In data science there’s a trope about how 90 percent of the work involved is cleaning and organizing the data itself,” Watts says. “It’s true for email data, browser data, Twitter data, news media data, and even administrative data that’s supposed to be clean.”

As usual, the whole system would be a lot simpler without people in it. If you’re trying to build a recommendation engine for, say, a vast streaming entertainment service, well, most people don’t watch the same movie over and over. So you get a spread on their behavior. That might be less true when it comes to dinner orders. “I’ve read some papers that say there are explorer types and there are the types who say, ‘this is my favorite restaurant, so why should I go anywhere else?’” says Joel Sokol, director of the Master of Science in Analytics degree at Georgia Tech. So they might not want a new recommendation, no matter how perfect. “That’s really more a business problem than a data problem,” Sokol says.

Most products in ecommerce have agreed-upon metadata, so-called stock-keeping units (or SKUs) that numerically keep track of inventory. As a result, “buying, navigating, discovering, personalizing, and recommending are relatively easy because everything looks the same to everyone,” says Maria Belousova, Grubhub’s CTO. “When it gets to food, it’s completely the opposite. Grubhub and every other company were trading paragraphs of text with a title and a price tag.”

A chef who used a regional, nonstandard spelling on the name of a dish rendered that menu incompatible with others that used a standard spelling. Leave out an ingredient and suddenly it’s a different dish. Belousova says the way to reconcile such differences is often through “collaborative filtering, meaning people who like this also like that.” But she says that for hyperlocal businesses like neighborhood restaurants, collaborative filtering doesn’t work well. There aren’t enough people to collaborate and there aren’t enough options to filter. The universe of choices and choosers is too small.

In the parlance of a data scientists, food is an unstructured domain. Grubhub had 14 million menu items and the only thing they had in common was that sometimes people ate them. So Belousova’s team set out to build its own taxonomy of food.

They realized they had three independent but overlapping datasets. First they had the menus, full of the unique snowflake language each restaurant used for each dish but with some commonalities. Luckily, since restaurants give their menus to Grubhub and Grubhub translates them for the website, the people making the food are incentivized to give a lot of information.

Second, Grubhub had user search logs and reviews. Those could show what people looked for and what they eventually ordered. And the company could limit the production of that data to actual, knowledgeable customers, since the service only gives reviewing rights to those who’ve actually ordered food. That only works on a platform where people are talking about stuff they’ve purchased; someplace like, oh, say, Yelp ends up being more of a free-for-all and can be less useful.

And third, they had order history for customers and, maybe more importantly, the volume of orders for each menu item. In this construction, more orders per item tells you that the specific item is of high quality—or at least is popular, which, yes, isn’t necessarily the same thing. But one might be a proxy for the other.

The tech team built an algorithm that could ingest all that data and begin to understand what the menus were actually saying. Almost. Because then they had to define what “is” is. Which is to say, like, what are bagels, really? What if the menu doesn’t call the boiled-dough baked round-with-a-hole bread product served with cream cheese and lox a bagel? It’s still a bagel, right?

This is a problem of nomenclature, and the algorithm was supposed to learn not only what a basic food is, from adobo to zaataar, but its characteristics—culinary metadata like spicy-versus-mild or vegetarian or what culture it hails from. Grubhub’s data team learned to extract significant terms from menus and overlay that with search terms, and whether they ended with orders or not. “We were envisioning a graph of dishes in the cloud, connected to each other,” Belousova says. “You need chefs, diner vocabulary, and order vocabulary. Overlay those three datasets together and you get those relationships.” It was a feedback loop innovative enough that they filed a patent on it.

But, yeah, so, it didn’t work.

Cookbook Author Turns Data Cook

That’s not totally fair. “You can cover maybe 35 to 40 percent of every menu if you have a good algorithm,” Maloney says. “But all the corner cases were unique.”

Grubhub went looking for help. It came in the shape of Melissa Schreiber, a culinary school grad and author of two books about the food of Brooklyn. “I came in and they handed me the classifications of all the menu items on our platform, and they weren’t organized into usable categories for search,” Schreiber says. “I basically tuned up what the data had turned up.”

Schreiber created a cuisine dictionary for the data team that broke down the ingredients in many of the dishes, an internal document that included names of cuisines, history, sometimes maps to show the geographic relationships. She built decks to explain to the data scientists dishes that didn’t have obvious names. “The taxonomy was obviously data driven, and it needed that human touch, that finesse of somebody that understood food more than data,” Schreiber says.

She helped the team map dishes to cuisines, drawing lines like the one between Japanese curry rice and Indian curries, let’s say, or how to separate tacos from burritos. “Do you have Sushiritto in San Francisco?” Schreiber asks me. “That was weeks of conversation. Is it sushi? Is it a burrito? Every time someone would go they’d take a picture of it and post it to me.”

All that fed back into making search more rational. If you’re looking for fish, do you want Dover sole or chirashi? When you order Chinese, maybe you think about the protein first, whereas with Mexican maybe you’re thinking, torta or combinacion? The data team took Schreiber’s edits and incorporated them into the search and recommendation algorithms.

Finding the Best Banh Mi

The result? A taxonomy of about 4,000 dishes, with every item in the menu database classified into multiple categories and subcategories. It’s not as sophisticated as what a data scientist might crave, but it does break into ideas as disparate as appetizers versus mains and healthy versus pizza.

“Our system is a vector of preference,” says Belousova, somewhat cryptically. “Now that you understand what every menu item is and what every diner likes, you can tie things together.”

Order from Grubhub a lot, and the system will build a taste profile for you and then suggest restaurants near you that match the profile, via email or a notification. Order one dish from a bunch of places, and the system will tell you where a lot of people order that dish. “If I know there’s a specific banh mi sandwich ordered 30 times by 1,000 people who live within one mile of you, that’s a good indicator that’s an amazing sandwich,” Maloney says. “If I know you’ve had six different chicken vindaloos from six restaurants with no re-orders, I know you’re looking, and I know from other people’s data what the most popular chicken vindaloo is. You better believe I’m putting that front and center for you.”

To be fair, lots of online food delivery businesses work with their data and have some kind of predictive recommendation algorithm. And it’s always challenging. “Some places are just a pizza restaurant. All they serve is pizza, and you don’t get a subcategory of ‘marinara’ or ‘margherita,’” says Enu Herzberg, head of data at Postmates. “And some places—imagine the Cheesecake Factory, with a subclass of every food on Earth.” So Postmates relies on collaborative filtering. Basically, you’ll probably like things that other people like, if they also like some of the things you like.

Postmates ingests menus, too, structuring some data itself, then using natural-language processing and other techniques to make distinctions that data scientists like, such as between a “category” and an “item.” “As you’re typing in the word ‘burger,’ we’re dynamically both searching the names of merchants and scanning menus,” Herzberg says. “You always pray for a cleaner dataset, but we’re pragmatic as well.” And Postmates is also learning about timing—about the kinds of things people generally order at a given time of afternoon, or more toward the beginning of a week for lunch (salad) versus the end (fried carbohydrates). That helps with recommendations for users, and it helps with optimizing where and when to send the people doing the deliveries.

Another leading company, DoorDash, uses its data for that kind of optimization as well—for its users and maybe more interestingly for the delivery runners, which the company calls dashers. “You want to make sure the customer gets the food at the time they expect. You want to get it at the best quality from the merchant,” says Rajat Shroff, DoorDash’s VP of product. “And we want to make sure the dashers don’t waste their time waiting around.” So its algorithms do load balancing based on dasher location, delivery address, and restaurant speed. “Zero wait time. That’s what the prediction algorithms are trying to do,” Shroff says.

All of which is why it was worth it to Maloney to build the artisanal menu database. Everyone is using collaborative filters to deliver recommendations. He’d like Grubhub to offer more. It cut data-sharing deals with Yelp and Foursquare; partnered with the company that owns KFC, Pizza Hut, and Taco Bell; and it’s buying up competitors like Yelp’s Eat24 delivery directory to increase to 80,000 the number of restaurants on the list. That’s big.

But the business is only going to get more competitive. A report from McKinsey says that in 2016, 30 percent of food-delivery orders came online, a figure it expects to increase to 65 percent by 2020. Morgan Stanley thinks online delivery could be a $220 billion market in 2020, 40 percent of total restaurant sales. But McKinsey says Grubhub, which connects diners to restaurants that actually handle the deliveries, will face more competition from “new delivery companies” that provide their own vehicles and logistics, giving those companies access to higher-end restaurants that want to reach customers without running their own deliveries. The Wall Street Journal points out that DoorDash just got funding to expand to 1,600 North American cities.

And then, as is customary to say at this point in this kind of story, there is Amazon. In this case, logistical legerdemain that combines the Grubhub-like Amazon Restaurants with delivery from the Amazon-owned Whole Foods grocery stores could upend the whole business.

That’s why it was worth it to Maloney to tell his data team to figure out recommendations and search. That McKinsey report says that once people decide which online delivery platform to use, 80 percent of them stick with it. “Anything we can do to increase personalization and more accurately predict what you are more likely to eat is going to increase conversion rate, frequency rate, and your affinity for my platform,” Maloney says.

And that does suggest a problem with Maloney’s original pizza question. This data can tell you what people order the most, but it still can’t tell you, objectively, what kind of pizza is the best. So all I can tell you is that, according to Grubhub, Chicagoans order deep dish pizza 722 percent more than in any other place in the United States. Data doesn’t lie, but you probably could have guessed that one. That fact that every other part of the country avoids deep dish? That’s what data scientists call “suggestive.” As a pizza scientist would say—especially one who also liked shrimp on her pie: correlation is not crustacean.

Data To Go

Don’t count on food delivery via robot anytime soon.
The holidays make delivery an even thornier problem.
Remember when Yelp got into the food delivery game?