logoalt Hacker News

andersayesterday at 11:01 PM6 repliesview on HN

Note this is not true random access in the manner it occurs in most programs. By having a contiguous array of indices to look at, that array can be prefetched as it goes, and speculative execution will take care of loading many upcoming indices of the target array in parallel.

A more interesting example might be if each slot in the target array has the next index to go to in addition to the value, then you will introduce a dependency chain preventing this from happening.


Replies

wtallistoday at 2:17 AM

> A more interesting example might be if each slot in the target array has the next index to go to in addition to the value, then you will introduce a dependency chain preventing this from happening.

However, on some processors there's a data-dependent prefetcher that will notice the pointer-like value and start prefetching that address before the CPU requests it.

show 3 replies
hansvmtoday at 3:06 AM

Fun fact, that's part of why parsing protobuf is so slow.

show 1 reply
jltsirentoday at 6:03 AM

If the next index is stored in the target array and the indexes are random, you will likely get a cycle of length O(sqrt(n)), which can be cached.

You can avoid this with two arrays. One contains random query positions, and the target array is also filled with random values. The next index is then a function of the next query position and the previous value read from the target array.

show 1 reply
jiggawattstoday at 12:06 AM

This is why array random access and linked-list random access have wildly different performance characteristics.

Another thing I noticed is that the spike on the left hand side of his graphs is the overhead of file access.

Without this overhead, small array random access should have a lot better per-element cost.

kortillatoday at 7:10 AM

The article very clearly compares using randomized indexes and sequential. It’s kinda the point of the article.

show 1 reply
delusionaltoday at 5:19 AM

> By having a contiguous array of indices to look at, that array can be prefetched as it goes

Does x86 64 actually do this data dependent single deref prefetech? Because in that case I have a some design assumptions I have to reevaluate.

show 4 replies