"Each benchmark was run multiple times, and I’m using the median to get rid of any potential outliers."
This is not how you should do benchmarks. Don't take the median, you don't even need to do any "warming up".
Simply run it long enough and only take the best result of each. This is more reliable and correct.
Benchmarks are fine but they will only be loosely correlated with the measured performance for any specific use case.
There is still substantial performance to be gained by creating bespoke hashmap designs at every point of use in code. The high dimensionality of the algorithm optimization space makes it improbable that any specific hashmap algorithm implementation will optimally capture the characteristics of a use case or set of use cases. The variance can be relatively high.
It isn't uncommon to find several independent hashmap designs inside performance-engineered code bases. The sensitivity to small details makes it difficult to build excellent hashmap abstractions with broad scope.
The performance of hash tables and hash functions significantly depends on the data distribution, and should be compared on real datasets.
I've covered it in my presentation: https://presentations.clickhouse.com/2017-hash_tables/
Not really comprehesive. Doesn't include my favorite https://github.com/greg7mdp/parallel-hashmap which adds thread-safety to performance.
Note that this benchmark does not include boost::unordered_flat_map. This is an open addressing variant of boost::unordered_map which has only been released in December 2022.
I wanted to mention this because boost::unordered_flat_map and boost::unordered_flat_set are among the fastest open addressing hash containers in C++ land. Internally, they use lots of cool SIMD tricks. If anyone is interested in the details, here's a nice blog post by the developer: https://bannalia.blogspot.com/2022/11/inside-boostunorderedf...