> Overall, I think block bloom filters should be the default most people reach for.
I think this depends on how big your filters are. Most people think of Bloom filters as having to have hundreds of thousands of elements, but I frequently find them useful all the way down to 32 bits (!). (E.g., there are papers showing chained hash tables where each bucket has a co-sited tiny Bloom filter to check if it's worth probing the chain.) In the “no man's land” in-between with a couple ten thousand buckets, the blocking seems to be mostly negative; it only makes sense as long as you actually keep missing the cache.
Yeah, I agree with this. I think there are open addressing hash tables like Swiss Table that do something similar. IIRC, they have buckets with a portion at the beginning with lossy “fingerprints” of items, which kind of serve a similar purpose as a bloom filter.
Bloom filters are useful for sharding so it stands to reason that a hash table implemented with shards would benefit.
Are you talking about Cuckoo++ tables, perhaps? If not can you point me to the hash table you had in mind? Always fun to learn of a new approach.
https://github.com/technicolor-research/cuckoopp