You seem to misunderstand @andersa's point which I think is well expressed - it doesn't matter if the indices are randomized if the CPU can pre-fetch what they will be. The power of CPU speculative execution to hide latency can be quite surprising the first time you see it.
This is a very small Nim program to demonstrate for "show me the code" and "it must just not be 'random enough'!" skeptics: https://github.com/c-blake/bu/blob/main/memlat.nim It uses the exact dependency idea @andersa mentions of a random cycle of `x[i] = i` that others else-sub-thread say some CPUs these days are smart enough to "see through". On Intel CPUs I have, the dependency makes things 12x slower at the gigabyte scale (DIMMs).
EDIT: This same effect makes many a naive hash table microbenchmark { e.g., `for key in keys: lookup(key)` } unrepresentative of performance in real programs where each `key` is often not speculatively pre-computable.
You seem to misunderstand @andersa's point which I think is well expressed - it doesn't matter if the indices are randomized if the CPU can pre-fetch what they will be. The power of CPU speculative execution to hide latency can be quite surprising the first time you see it.
This is a very small Nim program to demonstrate for "show me the code" and "it must just not be 'random enough'!" skeptics: https://github.com/c-blake/bu/blob/main/memlat.nim It uses the exact dependency idea @andersa mentions of a random cycle of `x[i] = i` that others else-sub-thread say some CPUs these days are smart enough to "see through". On Intel CPUs I have, the dependency makes things 12x slower at the gigabyte scale (DIMMs).
EDIT: This same effect makes many a naive hash table microbenchmark { e.g., `for key in keys: lookup(key)` } unrepresentative of performance in real programs where each `key` is often not speculatively pre-computable.