logoalt Hacker News

cluckindanyesterday at 8:24 PM2 repliesview on HN

How is this different from running tuned HNSW vector indices on Elasticsearch?


Replies

talipozturktoday at 12:12 AM

co-founder of Vectroid: We forked Lucene. Lucene is awesome for search in general, filters, and obviously full-text search. Very mature and well supported by so many big names and amazing engineers. So we take advantage of that but we had to change a few things to make it work perfectly for Vector use-case. We basically think Vector should be the main data type as it is the most difficult one to deal with. For instance, we modified Lucene to use X number of CPU / threads to build a single segment index. As a result, if/when needed, we can utilize hundreds of CPUs to index quicker and generate less number of segments that will enable lower query latency. We also built a custom File System Directory for Lucene to work off of GCS directly (or S3 later on). It can by-pass the kernel, read from network and write directly into the memory... no SSD, no page-cache, no mmap involved. Perhaps I should not say more...

wwdmaxwellyesterday at 10:30 PM

Aside from being serverless. This is like elasticsearch but with a kind of built in redis-like layer, I think.