Netflix study shows limits of cosine similarity in embedding models

vectors
Image credit: 123RF

This article is part of our coverage of the latest in AI research.

In many machine learning applications, you need to measure the proximity of different objects. For example, if you want to recommend a product to a user, you need to know what other users they are similar to or what products are similar to the ones they have previously purchased. In retrieval augmented generation (RAG) applications, you need to find documents with contents that are relevant to the user’s prompt and add them to the context of the language model (LLM).

There are different ways to measure similarity between objects. One that has become very popular in recent years is “cosine similarity,” a metric that measures the cosine of the angle between vectors. Many applications rely on cosine similarity to measure the proximity of low-dimensional embeddings of objects. 

But a new study by researchers at Netflix shows that cosine similarity can yield “arbitrary and therefore meaningless ‘similarities.’” Their findings show that the way deep learning models are trained can have unintended effects on the cosine similarity measure of their embeddings. 

Embeddings and cosine similarity

Embedding models are deep learning systems that learn a low-dimensional set of features from complex entities. For example, in LLMs, embeddings are numerical vectors that represent the semantical meaning of words based on their surrounding context. Image embeddings can learn the visual features and contents of images. In recommendation systems, embeddings can represent the relationship between users and items.

Embeddings can be used as inputs to other models or as a tool to measure the similarity between two things. Cosine similarity measures the proximity of two embedding vectors by calculating the dot-product of their normalized values. At 1, the two vectors are perfectly aligned. At 0, they are perpendicular. And at -1, they point in opposite directions. 

Cosine similarity has become a popular metric for measuring the “semantic similarity” between different entities. However, when normalizing the vectors, we discard their magnitude.

Therefore, if two vectors point in the same direction but have different lengths, they will still have perfect cosine similarity. The premise of using cosine similarity is that the magnitude of the vectors is not as relevant as their direction.

“While there are countless papers that report the successful use of cosine similarity in practical applications, it was, however, also found to not work as well as other approaches, like the (unnormalized) dot-product between the learned embeddings,” the Netflix paper states. “We show that cosine similarity of the learned embeddings can in fact yield arbitrary results. We find that the underlying reason is not cosine similarity itself, but the fact that the learned embeddings have a degree of freedom that can render arbitrary cosine-similarities even though their (unnormalized) dot-products are well-defined and unique.”

Testing cosine similarity in recommendation systems

The researchers tested the hypothesis on a simple linear problem for recommendation systems. The embedding model is given a matrix that represents users and items and must learn a lower-dimensional representation that preserves the relevant features of the entities. This embedding model should then be able to measure the similarity of users based on the items they have interacted with.

Their findings show that the effect of cosine similarity depends a lot on how the embedding model is trained. “When a model is trained [with respect to] the dot-product, its effect on cosine-similarity can be opaque and sometimes not even unique,” the researchers observe.

This means that cosine similarity doesn’t map to the right features, and as a result, items that should be similar can have very different embeddings and items that are very different can have similar embeddings. Practically, this will result in bad recommendations, or more broadly, matching the wrong objects together.

“The motivation [of the study] was an internal research project where we noticed inconsistent results. We then dug deeper to understand the underlying cause,” Harald Steck, Senior Research Scientist at Netflix and lead author of the paper, told TechTalks.

The researchers caution against “blindly using cosine-similarity” and suggest several remedies, such as training the model for cosine similarity or applying normalization during or before training instead of during the measurement of cosine similarity. 

An easier first step is to create a validation set for your embedding model that is representative of your problem space. You can use these examples to test your embedding model on different similarity measures, including dot-product, euclidean distance, and cosine-similarity. Sometimes, just changing the proximity measure can considerably improve the results of your model.

“As there are so many applications, and each is possibly different, it’s hard to provide a catch-all solution,” Steck said. “In general, I think, it is a good idea to check if the obtained cosine similarities of the embeddings make sense in one’s application–not only by calculating some metrics, but also by inspecting them manually. And as we outlined in the paper, the main problem is not the cosine similarity itself, but the embeddings it is computed from, as the learned embeddings may have some degrees of freedom that can make them work better or worse when used in cosine similarity.”

4 COMMENTS

  1. Thank you for writing this article as we all think cosine is the best out there.
    Though I have a question:
    what does “applying normalization during or before training” really mean? how do you practically do that?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.