logoalt Hacker News

delusionaltoday at 5:19 AM4 repliesview on HN

> By having a contiguous array of indices to look at, that array can be prefetched as it goes

Does x86 64 actually do this data dependent single deref prefetech? Because in that case I have a some design assumptions I have to reevaluate.


Replies

alain94040today at 3:57 PM

On modern cpus? Most likely. Those kinds of optimizations are done by the core with no compiler magic needed.

CPU implementation has become too complex to grasp. The only sure way to know how a CPU will behave for a given workload is to run the workload. It's good to have some basic expectations of performance, instructions/cycle, memory bandwidth, to detect if something is off. I guess I'm trying to say it's hard to keep in your head all the details of what ~1B transistors are doing together to run your code. It's just too big.

phi-gotoday at 6:13 AM

Hardware definitely supports this but it might need compiler support, as in adding instructions to do prefetching. Which might be done automatically or requires a pragma or calling a builtin. So it can be implemented in any case.

shaknatoday at 7:13 AM

The compiler probably does [0].

[0] https://gcc.gnu.org/projects/prefetch.html

show 1 reply