Benchmarks are fine but they will only be loosely correlated with the measured performance for any specific use case.
There is still substantial performance to be gained by creating bespoke hashmap designs at every point of use in code. The high dimensionality of the algorithm optimization space makes it improbable that any specific hashmap algorithm implementation will optimally capture the characteristics of a use case or set of use cases. The variance can be relatively high.
It isn't uncommon to find several independent hashmap designs inside performance-engineered code bases. The sensitivity to small details makes it difficult to build excellent hashmap abstractions with broad scope.
It's also the case that performance of a hashmap, or anything, in a small-scale benchmark may not reflect the performance in a large system that does things other than manage maps. There are side effects like how many icache lines are visited during a map operation. These don't hurt microbenchmarks but they can matter in real systems. It may not be completely pointless to microbenchmark a data structure but it can be ultimately misleading.
Definitely. As an extreme but fun example... in one project I had a massive hash map (~700 GB or so) that was concurrently read to/written from by 256 threads. The entries themselves were only 16 bytes and so I could use atomic cmpxchg, but the problem I hit was that even with 1GB huge pages, I was running out of dTLB entries. So I assigned each thread to a subregion of the hash table, then used channels between each pair of threads to handle the reads and writes (and restructured the program a bit to allow this). Since the dTLB budget is per core, this allowed me to get essentially 0 dTLB misses, and ultimately sped up the program by ~2x