logoalt Hacker News

Sesse__yesterday at 11:03 AM3 repliesview on HN

> Overall, I think block bloom filters should be the default most people reach for.

I think this depends on how big your filters are. Most people think of Bloom filters as having to have hundreds of thousands of elements, but I frequently find them useful all the way down to 32 bits (!). (E.g., there are papers showing chained hash tables where each bucket has a co-sited tiny Bloom filter to check if it's worth probing the chain.) In the “no man's land” in-between with a couple ten thousand buckets, the blocking seems to be mostly negative; it only makes sense as long as you actually keep missing the cache.


Replies

vanderZwanyesterday at 8:05 PM

Are you talking about Cuckoo++ tables, perhaps? If not can you point me to the hash table you had in mind? Always fun to learn of a new approach.

https://github.com/technicolor-research/cuckoopp

show 1 reply
vlmutoloyesterday at 5:45 PM

Yeah, I agree with this. I think there are open addressing hash tables like Swiss Table that do something similar. IIRC, they have buckets with a portion at the beginning with lossy “fingerprints” of items, which kind of serve a similar purpose as a bloom filter.

hinkleyyesterday at 7:15 PM

Bloom filters are useful for sharding so it stands to reason that a hash table implemented with shards would benefit.