Their self-reported benchmarks have them out-performing pinecone by 7x in queries-per-second: https://zvec.org/en/docs/benchmarks/
I'd love to see those results independently verified, and I'd also love a good explanation of how they're getting such great performance.
Author here. Thanks for the interest! On the performance side: we've applied optimizations like prefetching, SIMD, and a novel batch distance computation (similar to a GEMV operation) that alone gives ~20% speedup. We're working on a detailed blog post after the Lunar New Year that dives into all the techniques—stay tuned!
And we always welcome independent verification—if you have any questions or want to discuss the results, feel free to reach out via GitHub Issues or our Discord.
It is absolutely possible and even not so hard. If you use Redis Vector Sets you will easily see 20k - 50k (depending on hardware) queries per second, with tens of millions of entries, but the results don't get much worse if you scale more. Of course all that serving data from memory like Vector Sets do. Note: not talking about RedisSearch vector store, but the new "vector set" data type I introduced a few months ago. The HNSW implementation of vector sets (AGPL) is quite self contained and easy to read if you want to check how to achieve similar results.
Pinecone scales horizontally (which creates overhead, but accomodates more data).
A better comparison would be with Meta's FAISS
PGVectorScale claims even more. Also want to see someone verify that.
8K QPS is probably quite trivial on their setup and a 10M dataset. I rarely use comparably small instances & datasets in my benchmarks, but on 100M-1B datasets on a larger dual-socket server, 100K QPS was easily achievable in 2023: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search... ;)
Typically, the recipe is to keep the hot parts of the data structure in SRAM in CPU caches and a lot of SIMD. At the time of those measurements, USearch used ~100 custom kernels for different data types, similarity metrics, and hardware platforms. The upcoming release of the underlying SimSIMD micro-kernels project will push this number beyond 1000. So we should be able to squeeze a lot more performance later this year.