Every Byte Matters

187 points • by ingve • today at 11:04 AM • 86 comments • view on HN

Comments

Tip: to get LN cache sizes on mac, the commmand is

    $ sysctl -a | grep "l.*cachesize" | gnumfmt --field=2 --to=si
    hw.perflevel1.l1icachesize:   132k
    hw.perflevel1.l1dcachesize:   66k
    hw.perflevel1.l2cachesize:    4,2M
    hw.perflevel0.l1icachesize:   197k
    hw.perflevel0.l1dcachesize:   132k
    hw.perflevel0.l2cachesize:      13M
    hw.l1icachesize:   132k
    hw.l1dcachesize:   66k
    hw.l2cachesize:    4,2M

And the equivalent to LEVEL1_DCACHE_LINESIZE is

    $ sysctl -a | grep hw.cachelinesize
    hw.cachelinesize: 128

moring • today at 1:52 PM

The article shows nicely how "every byte matters" is false. First, it starts off by talking about the cost of a new field, when the actual topic is array-of-structs vs. struct-of-arrays. Then, this:

> How much of an impact can this have? > Reading is:alive (1 byte) Across 1M Monsters

You aren't reading one byte here, you are reading 1M bytes! Of course, optimizing the access to 1M bytes is something to consider. Optimizing the access to one byte isn't.

The article is definitely worth reading IMHO, but it really needs a better headline!

➕ show 1 reply

noelwelsh • today at 11:46 AM

The JVM is currently pretty bad for memory allocation. Every object (i.e. not a primitive) has a header that IIRC is 12 bytes. But there is good news in JVM land: this will be reduced to 8 bytes in the next JVM release, and Project Valhalla will give the tools to do away with headers entirely in some cases. Project Valhalla also has tools to manage off-heap memory, which is important in many cases.

The JVM is an odd place where it requires too much heap to compete with the AOT compiled languages, but its startup time is too slow compared to interpreted languages. I think these enhancements are essential to keep the platform relevant.

➕ show 2 replies

pron • today at 11:55 AM

> The cost of each new field is rarely considered

Most developers, in Java and in most other languages, do not consider the cost of every field, but I can tell you that people who need micro-optimisations certainly do care, and in Java's standard library, a layout is very much a concern (except, as always, you want to optimise what really matters; there's no point in optimising something that is unlikely to be a hot spot in a real program). Sometimes, though, you want to intentionally spread out the layout to avoid cache line sharing when concurrency is involved. You will find such examples in the standard library, too.

ChrisMarshallNY • today at 2:55 PM

I started off with Machine Code, on a device with 256 bytes (not KB) of RAM. That was 256 bytes, to install the executable, reserve the stack, and set up the heap.

We often used bit (not byte) fields, to convey information.

Made life challenging.

However, being able to be sloppy has its definite advantages. It takes a long time to design highly-optimized stuff. If just declaring a couple of new properties takes thirty seconds, and designing a bitfield takes an hour, then we have some real cost-savings, there.

That said, it's easy to get crazy, these days. I just spent a couple of days, chasing down greedy memory hogs. These were operations that ate gigabytes of memory. I determined that the real culprit was actually Apple MapKit, and figured out a simple workaround, but it took a long time to get there. If I suspect the OS, then it's usually my fault, and trying everything before going back to the OS takes time.

➕ show 1 reply

forinti • today at 11:33 AM

So if you need speed, you just have to swallow your OO programmer's pride and put your data in arrays.

➕ show 3 replies

recursivedoubts • today at 2:38 PM

When you are developing games, sometimes.

When you are developing most other applications every byte does not matter. What matters much more is overall system architecture, collapsing unnecessary abstraction layers that some developers (especially java developers) seem to love and optimizing your datastore access.

As always, profile profile profile.

A company I worked for spent a violent couple of man-decades flipping our proprietary scripting language from interpeted to bytecode generation, obviously with tons of bugs and subtle semantic changes, and it ended up boosting overall system performance by about 30%. We could have done nothing over that period of time and hardware advances would have made a bigger impact.

rao-v • today at 2:50 PM

Anyways find it odd that major languages don’t have a built in way of asking for an array of objects to be optimized as SoA or AoS

➕ show 1 reply

compiler-guy • today at 4:39 PM

SoA can be a big win. But so can plain AoS, just depends on the access pattern.

Profiling important workloads matters. Without that everything else is guesswork.

ssiddharth • today at 11:47 AM

Slight tangent, but every ms, μs, and ns counts too. We've gotten awfully carefree with response times and wasted compute cycles.

Luff • today at 1:33 PM

Yes we should end the hateful rhetoric of most and least significant bytes. Every Byte Matters.

➕ show 1 reply

SuperV1234 • today at 2:47 PM

Data Oriented Design rocks. It was the subject for my CppCon 2025 keynote: https://youtube.com/watch?v=SzjJfKHygaQ

➕ show 1 reply

nasretdinov • today at 2:19 PM

Ideally you'd want to go further and actually store the is_alive as a bit mask and use SIMD instructions to filter out zeroes for example.

coldcity_again • today at 11:29 AM

I love to see stuff like this. And an active Vectrex gamedev and PC/Amiga sizecoder I strongly agree with the sentiment!

AxelWickman • today at 1:15 PM

Cool read. The AoS vs SoA speaks for itself.

readthenotes1 • today at 3:43 PM

"In that time, you get used to huge classes. New functionality? Just add a new method and field to the class"

I guess this is one reason why object-orientation has such a bad reputation.

I once worked at a bank where the OO mentor had taught people that the only object they needed was "Tape" and have them replicate the structure of data on the old spooled tape reels.

The struct of arrays reminds me of this optimization.

yas_hmaheshwari • today at 11:55 AM

Out of course: I had thought about reading an article about Iran war or some geo political news when I read fzakaria :-)

RickJWagner • today at 12:20 PM

That’s a great read. I wish more people wrote like that.

➕ show 1 reply

coolThingsFirst • today at 12:19 PM

Why doesn’t the machine fill up the other cache lines as well why is 64 bytes only and then a miss?

➕ show 3 replies

burnt-resistor • today at 2:11 PM

I'm curious if anyone has had to write a JNI extension for a hot (CPU, GPU, RAM) section the JVM was unable to effectively JIT and/or optimize enough.

maoliofc • today at 2:27 PM

[flagged]

alt Hacker News

Every Byte Matters

Comments