Hamming Distance for Hybrid Search in SQLite

67 points • by enz • last Sunday at 7:21 AM • 11 comments • view on HN

Comments

USearch has a sqlite extension that supports various metrics on including Hamming distance on standard sqlite BLOB columns. It gets similar performance and is very convenient.

(There's also an indexed variant that does faster lookups, but it uses a special virtual table layout that constrains the types of the other columns in the table.)

See https://github.com/unum-cloud/USearch. pip-installable for Python users.

jonatron • today at 3:42 PM

You could first calculate the distance of the first n bits (eg: 64, one popcountll) as a first pass, then calculate the full distance for candidates over a threshold from the first pass. It makes it approximate, but depending on the application it can be worth it.

➕ show 1 reply

stephenheron • today at 4:00 PM

I've had good success in using this: https://github.com/sqliteai/sqlite-vector if you want something a bit more "off the shelf" if you are using SQLite.

➕ show 1 reply

andai • today at 4:43 PM

Has anyone tried keyword expansion in this context?

I had the idea of making a "poor man's embeddings" for document similarity. You want two documents to match even if they share no keywords, as long as their keywords are closely related, right? That seems like a very solvable problem.

woadwarrior01 • today at 7:16 PM

There's also the recently released zvec[1], the tagline for which is: The SQLite of Vector Databases.

[1]: https://github.com/alibaba/zvec

esafak • today at 5:07 PM

Are today's models any good at helping write postgres or sqlite extensions?

alt Hacker News

Hamming Distance for Hybrid Search in SQLite

Comments