logoalt Hacker News

adrian_byesterday at 1:54 PM5 repliesview on HN

As a general rule, also the amount of physical memory installed in a computer should be proportional with the number of hardware threads provided by its CPU.

Besides the fact that the operating system may allocate some memory for each thread, when you launch a multi-threaded application that is able to use all available threads, for instance the compilation of a big software project, it frequently will allocate some working memory in an amount proportional with the amount of working threads.

I have encountered many multi-threaded applications that need up to 2 GB per thread to work well.

This corresponds to having 64 GB for a desktop CPU with 32 threads, like Ryzen 9 9950X.

For the compilation example, I have seen software projects, like Chrome/Chromium and its derivatives, where if you do not have enough memory, proportional to the number of hardware threads, e.g. when you have only 32 GB for a 16 core/32 thread CPU, you must reduce the number of concurrent compilations, e.g. with an appropriate parameter to "make -j", leaving some threads and cores idle, because otherwise you may encounter out-of-memory errors.


Replies

rbanffytoday at 11:35 AM

> when you have only 32 GB for a 16 core/32 thread CPU, you must reduce the number of concurrent compilations

Also, depending on the architecture, avoiding odd(or even) virtual cores might free more L2 or L3 for the worker threads and speed up the process.

embedding-shapeyesterday at 5:12 PM

Compiling flash-attn (Flash Attention) is a another great stress-test for CPU+RAM as just using 16 threads can balloon you into 128GB RAM usage territory already. Same thing with needing to not do too much concurrency when compiling it.

show 1 reply
Neywinyyesterday at 11:15 PM

It's an important point. I went from 4c/8t and 32GB to 16/32 and 96GB. Dramatically less memory per thread. Some software (looking at you, Vivado) can take incredible amounts of memory per parallel job thus mandating some projects can only run with a subset of my cores. At least until I stepped up my work laptop to 10.66 GB/thread. That seems to be manageable

realoyesterday at 3:26 PM

Yes! I have also observed that with compilation VMs on a big server.