> At a high level it’s a bit contradictory; trying to reduce the tail latency of cold reads by doubling the cache footprint makes every other read even colder.
That’s my main hang up as well. On one hand this is undeniably cool work, but on the other, efficient cache usage is how you maximize throughput.
This optimizes for (narrow) tail latency, but I do wonder at what performance cost. I would be super interested in hearing about real world use cases.
This might be useful in a case where a small lookup or similar is often pushed out from cache such that lookups are usually cold. Yet lookup data might by small enough to not cause issue with cache pollution, increased bandwidth or memory consumption.