logoalt Hacker News

How much linear memory access is enough?

56 pointsby PhilipTrettnerlast Wednesday at 1:16 PM8 commentsview on HN

Comments

gwkingtoday at 8:06 PM

I’ve casually experimented with this in python a number of times for various hot loops, including those where I’m passing the chunk between c routines. On Apple M1 I’ve never seen a case where chunks larger than 16k mattered. That’s the page size, so totally unsurprising.

Nevertheless it’s been a helpful rule of thumb to not overthink optimizations.

smj-edisontoday at 7:48 PM

Side note, but this product looks really cool! I have a fundamental mistrust of all boolean operations, so to see a system that actually works with degenerate cases correctly is refreshing.

aapoalastoday at 7:45 PM

Would kernel huge pages possibly have an effect here also?

PhilipTrettnerlast Wednesday at 1:27 PM

I looked into this because part of our pipeline is forced to be chunked. Most advice I've seen boils down to "more contiguity = better", but without numbers, or at least not generalizable ones.

My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.

show 1 reply
_zoltan_today at 5:01 PM

is this an attempt at nerd sniping? ;-)

on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.

I need to add it to my TODO list to have a look at your github code...

show 1 reply
01HNNWZ0MV43FFtoday at 7:20 PM

This is good data, but I'm not sure what the actionable is for me as a Grug Programmer.

It means if I'm doing very light processing (sums) I should try to move that to structure-of-arrays to take advantage of cache? But if I'm doing something very expensive, I can leave it as array-of-structures, since the computation will dominate the memory access in Amdahl's Law analysis?

This data should tell me something about organizing my data and accessing it, right?