> The obvious one outside of KV caches as mentioned above is vector databases. Any RAG pipeline that stores embedding vectors for retrieval benefits from the same compression. TurboQuant reduces indexing time to “virtually zero” on vector search tasks and outperforms product quantisation and RabbiQ on recall benchmarks using GloVe vectors.
This part sounds especially cool. I did not think about this application when reading the other articles about TurboQuant. It would be cool to have access to this performance optimization for local RAG.