My binary vector search is better than your FP32 vectors. I’m still trying to get my head around this, but here’s what I understand so far.
Embedding vectors as calculated by models such as OpenAI text-embedding-3-small are arrays of floating point values, which look something like this:
[0.0051681744, 0.017187592, -0.018685209, -0.01855924, -0.04725188...]—1356 elements long
Different embedding models have different lengths, but they tend to be hundreds up to low thousands of numbers. If each float is 32 bits that’s 4 bytes per float, which can add up to a lot of memory if you have millions of embedding vectors to compare.
If you look at those numbers you’ll note that they are all pretty small positive or negative numbers, close to 0.
Binary vector search is a trick where you take that sequence of floating point numbers and turn it into a binary vector—just a list of 1s and 0s, where you store a 1 if the corresponding float was greater than 0 and a 0 otherwise.
For the above example, this would start [1, 1, 0, 0, 0...]
Incredibly, it looks like the cosine distance between these 0 and 1 vectors captures much of the semantic relevant meaning present in the distance between the much more accurate vectors. This means you can use 1/32nd of the space and still get useful results!
Ce Gao here suggests a further optimization: use the binary vectors for a fast brute-force lookup of the top 200 matches, then run a more expensive re-ranking against those filtered values using the full floating point vectors.
Recent articles
- ChatGPT in "4o" mode is not running the new features yet - 15th May 2024
- Slop is the new name for unwanted AI-generated content - 8th May 2024
- Weeknotes: more datasette-secrets, plus a mystery video project - 7th May 2024
- Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
- Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
- AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024