New Emoji Are So Boring—but They Don't Have to Be

A new data set on the popularity of emoji reveals a problem with Unicode's approval process, along with a way to fix it.
thinking face emoji
Photograph: Unicode 

If you've been unenthused about the emoji of recent years, you're not alone. A flashlight? A toolbox? A fire extinguisher? A tin can? Who even uses these?

The emoji set to appear on your phone next year are similarly dismal. A screwdriver, a toothbrush, a bell pepper—seriously, what is this, a shopping center? When you think of emoji, you don't think of a laundry list of random objects. You think of iconic, sometimes weird, expressive faces, like the face with tears of joy, the thinking face, the angry devil, the smiling pile of poo, and the see-no-evil monkey, plus classic symbols like the thumbs-up and the heart. But the latest batch includes just three new faces and one new hand shape, compared with 49 new objects, from a roller skate and a rock to a plunger.

The reason for this slide into irrelevance? The Unicode Consortium—the organization in charge of determining which symbols our devices are supposed to recognize—has increasingly been measuring the wrong thing in the process of approving new emoji.

No one intends to encode boring emoji, of course. Unicode has three main criteria, and one of them is, "Is there substantial evidence that a large number of people will likely use this new emoji"? Sounds good in theory, but what actually is "substantial evidence"? Unicode doesn't consider emoji data that comes from petitions, corporate sponsorship, or nonpublic data sources, judging them too easy to manipulate. After it encoded the original set of emoji from Japanese cell phone carriers, Unicode started looking to search results: If you submit a proposal for a new emoji, you need to provide screenshots showing how many web pages are found when you search for its associated word or phrase on Google, Bing, and Google Video Search, plus Google Trends.

Unicode's official guidelines note that the median emoji has 500 million search results for a regular Google search, 25 million on Bing, and 75 million on Google Video Search. While "the values are factors that are taken into consideration, not hard limits," the Emoji Subcommittee generally takes a dim view of prospective emoji that aren't in this ballpark. For example, T. rex, which did become an emoji, meets the threshold—Google reports 554 million pages that mention this word—while ichthyosaur, which was rejected, is nowhere near it (less than a million).

Search results do have some advantages. It's harder (though not impossible) to astroturf your way to half a billion search results than half a billion electronic signatures on a petition. And unlike a proprietary internal data set, it's easy to verify a Google screenshot (you can just repeat the search yourself and look for that grey number right above your list of links). But search results also have a big disadvantage: Do people really make websites about the same sorts of things they use emoji for?

As someone who's been spending serious time observing how people use emoji over the past few years, my hunch was no. But I wasn't able to prove it until a new Unicode dataset came out a few weeks ago: It's a public list of all 1,468 emoji ranked by how much people are using them. (Emoji from 2018 and later were excluded for the time being, because they're not necessarily broadly available on all devices, so they might not be living up to their potential yet.) Unicode wouldn't specify the sources of the data—I'm assuming it's from major tech companies, many of which are Unicode members—but did tell me the data was international, from the past six months, and on a log scale according to the median for each emoji across several sources, to avoid getting skewed by outliers on a single platform.

There really hasn't been a comparable public data set like this before; everything else is seriously patchy. Emojipedia puts on its homepage the top half-dozen or so emoji by searches, which can give us an idea of when a new emoji is really taking off, but has no information on anything below that top handful. Emojitracker tracks emoji use in real time on Twitter, which seems great until you realize that no new emoji have been added to the tracker since 2015 and that certain emoji (such as the recycling sign for retweet) are way more popular in spam tweets than tweets by actual people. Periodically, some company trying to court publicity will put out a press release with the "top 50" or "top 100" emoji, often from mysterious sources and lumped together into mysterious categories that prevent one from being able to do any serious stats. To be clear, I've been citing them all anyway, because there's still an emoji chapter in my book about internet linguistics, but they haven’t been rigorous or reliable enough to look for trends, except to notice that faces and hands and hearts are consistently the most popular categories.

These new stats let us dig deeper. Many of the most popular existing emoji would not have passed Unicode's search criteria if they'd been in place at the time: smiling face with smiling eyes, face with tears of joy, loudly crying face, sparkle heart, eggplant, smiley poo, devil face, see-no-evil monkey, party popper, bicep, crossed finger, and shrug. None of these have anywhere near the benchmark 500 million results when you search for them in Google, even in 2019 when those results have been juiced by many pages about the emoji themselves—instead, they got in by being on Japanese phones before Unicode started taking over the decision-making process. On the other hand, many emoji that do meet the search criteria have languished far below the median level of popularity since they were introduced, including scooter, pita with falafel, rhino, tin can of food, coat, fortune cookie, bobsled, pretzel, gloves, vampire, zebra, hedgehog, rockstar/singer, and astronaut.

To be sure, sometimes the results do align: red heart, heart-eyes, fire, balloon, thumbs-up, and thinking face are all very popular as both search results and as emoji. And surely the search criteria did manage to exclude some genuinely obscure candidates. (T. rex does pretty well as both an emoji and as a search result, but I doubt ichthyosaur would have achieved similar popularity.) But overall, using search results to predict emoji usage is, to update an idiom, a case of comparing apple emoji to orange emoji.

It's not just that emoji approved according to the newer criteria have had less time to catch on, because other emoji introduced in those same years have rocketed to popularity, like the thinking face and face surrounded by hearts. It's more about what concepts get encoded as emoji. Using search results biases us toward common nouns—that's how we get those rhinos and coats and vampires and pretzels. But people don't generally use emoji as substitutes for nouns. They could, but they don't. Instead, emoji are used in addition to words, as a way of providing further context or emotion or illustration, like how we use gestures alongside the physical kind of language, and that's what faces and hands and hearts are particularly good at.

Five or 10 years ago, in earlier editions of Unicode, we didn't really know how (or even if) the world was going to start using emoji. Maybe they were just a Japanese thing, maybe people would actually have stuck them in the middle of their sentences in place of words or used emoji for the same things that they make websites about. But now that we do have this data, and I hope that this is why Unicode released it, we can add it as a useful counterbalance to search data, when people are proposing more new emoji. For example, if someone wants to propose an emoji for a new article of clothing (say, pajamas), they can see not just how well the word "pajamas" does in search by itself, but also compare it to the popularity of the existing clothing emoji.

So which kinds of emoji should we expect to see more and less of, if Unicode starts taking into consideration the popularity of existing emoji? To find out, I downloaded Unicode's emoji frequency data set, labeled all of the emoji by category (I believe this practice is now popularly referred to as "training the neural net"), and calculated some stats.

I used my own categories because I was interested in making finer-grained distinctions than are typically found on your emoji keyboard: distinguishing between round traditional faces like tears of joy and anger; "weird faces" with expressions on other characters such as the devil smiley, the heart-eyed cat, or the see-no-evil monkey; people in specific poses such as the shrugging person or dancer; people with no particular pose or expression representing archetypes, such as the redhead or astronaut; and groups of people such as all the various couples and families.

I also wanted to get a sense of the range in each category. It's easy to look at a level of popularity and notice that there are a lot of faces in the top bands or a lot of flags and symbols in the lowest ones. But would all faces be popular and would all symbols be unpopular, or are a few outliers skewing our perceptions of the whole group?

To figure this out, I calculated five statistics for each category: the level of the most and least popular emoji, the level of the emoji with the exact middle level of popularity (the median), and the level of the emoji that were 25% and 75% most popular (the first and third quartiles). This means that each box contains half of the emoji in a given category, those that are clustered above and below the middle level of popularity, and the lines outside show the other half of the category—those that are further from the median. When a box is small and the lines are short, such as the animals, the emoji in that category have a very consistent level of popularity. When a box is large or the lines are long, such as with people in various poses, the emoji in that category have very different levels of popularity.

I've graphed these stats together below, in order by their medians.

Chart: Gretchen McCulloch/WIRED 

Unicode also helpfully tells us the median level of popularity in terms of emoji overall (out of 1,468 emoji, these are the ones that rank below #735 in popularity), which I've drawn as a dashed line on the graph. The middle looks fairly low because there are so many flags and most of them aren't very popular, but as an international standards body Unicode had to either encode all of the country flags or none of them. Usefully for our purposes, however, this means that when other categories of emoji are below median, it's really not a great sign. So, given this, we can see that it's probably not a good idea to add clothing emoji for a while: The median clothing emoji is already below the median overall. Sorry, pajamas. Similarly, new transportation emoji? Even worse idea.

The hearts, smiley faces, and hand gestures are indeed the most popular, confirming the results in less complete data sets, with every single member of these categories scoring above median. (Disclosure: I noticed that hand emoji were punching above their weight in the sketchier datasets and yet not making up many new proposals while I was writing Because Internet, so I've already started writing proposals for more, several of which are currently before the Unicode Consortium.)

Faces with an emotional expression on some sort of other character ("weird faces"), like see-no-evil monkey 🙈, heart-eyed cat 😻, and smiling pile of poo 💩 also do really well, with every single one of them except for the Easter Island head 🗿 scoring above the median emoji. (It's also debatable whether Easter Island head truly belongs in this category.) However, Unicode hasn't been encoding more weird faces recently, preferring to encode people with neutral expressions with outfits/appearance ("kind of person" such as redhead or astronaut) or in particular poses ("pose person" such as dancer or skier). Appearances and poses are easy to show in search results, but turn out to be substantially less popular in actual use than the difficult-to-measure weird faces from the original Japanese emoji set.

One surprise from the data is that plants are much more popular than animals, even though many more animals than plants have been added in recent years. Even the lowest-ranked plant emoji is above the median overall, and as a cluster they do almost as well as the hand gestures. In comparison, the animal emoji are closer to the median—mostly still above it, but some do dip below. Moreover, if we look at what the most popular emoji are within the plants, we find that they're all flowers. We need to get through seven different kinds of flowers before the next most popular plant shows up, the herb 🌿. This is perhaps less surprising than it might seem at first glance. Flowers have a long history of being used symbolically and the most popular flower emoji is the rose 🌹, which has a dual symbolism for romance and socialism. Not to get too millennial about it, but this suggests there might actually be a case for a succulent emoji beyond the cactus. (Incidentally, within the animal category, those that do best are the ones shown as faces rather than full bodies, again counter to recent Unicode encoding trends.)

Reassuringly, symbols and objects don't do quite as badly as we might have predicted, displaying a huge range from the highly popular (🔥🎉✨ 💯 🎶) to extremely obscure (eject symbol ⏏️, Latin small letters 🔡, filing cabinet 🗄️, metal clamp 🗜️), and with medians just below the middle overall. The most popular clock face time is 12 o'clock 🕛 and the least popular time is 3:30 🕞. (Unsurprisingly, though, all the clock faces are below median.) However, even the most popular country-level flags don't do all that great, so Unicode would be wise not to open the giant can of worms that is encoding subcountry-level divisions likes states, provinces, and other districts. As for nongeographic flags, we just don't have much data yet. The rainbow 🏳️‍🌈 and checkered 🏁 flags both do quite well, though—they're in the top 12 among hundreds of flags—so I'd expect the new pirate and trans flags to also be reasonably popular once their data starts rolling in.

I could go on like this for every category and subcategory, and that's something I'm hoping people will do with this data—annotate it with their own categories of interest, figure out which categories could stand to take a break from new emoji proposals, and propose emoji to fill gaps in popular categories that have recently been neglected. Of course, these rankings are all relative, so there is never going to be a Lake Emoji Statistics Wobegon, where all emoji are more popular than average. But we could try to make the categories less skewed with respect to each other—if animals get to have some more obscure entries, then why not also take a chance on more plants or weird faces? Or perhaps Unicode will eventually follow this data set with further data that contains absolute numbers.

Even if you're never going to write an emoji proposal, this emoji popularity data set is still exciting. At their most basic level, emoji are just a bunch of little pictures, and little pictures are nothing novel by themselves. In the long view of human culture, it's hard to think of much that's less novel than a bit of illustration. What makes emoji different from any other set of little pictures—clip art, Wikimedia Commons, stock photos, rebus puzzles, the much-overhyped cave paintings, or hieroglyphics—is that billions of people use them every day, in ordinary messages to each other.

In other words, what makes emoji interesting is usage data, data like this, data that shows how we're fleshing out the ways that we talk with each other so we can be more fully human online—even if we sometimes do so by sending weird animal faces.


More Great WIRED Stories