logoalt Hacker News

bigyabailast Thursday at 3:38 PM3 repliesview on HN

taps the sign

  Unified Memory Is A Marketing Gimmeck. Industrial-Scale Inference Servers Do Not Use It.

Replies

wren6991last Friday at 3:25 AM

On M5 Pro/Max the memory is actually just attached straight to the GPU die. CPU accesses memory through the die-to-die bridge. I don't see the difference between that and a pure GPU from a memory connectivity point of view.

Wrt inference servers: sure, it's not cost-effective to have such a huge CPU die and a bunch of media accelerators on the GPU die if you just care about raw compute for inference and training. Apple SoCs are not tuned for that market, nor do they sell into it. I'm not building a datacentre, I'm trying to run inference on my home hardware that I also want to use for other things.

show 1 reply
zozbot234last Thursday at 3:42 PM

Industrial Scale Inference is moving towards LPDDR memory (alongside HBM), which is essentially what "Unified Memory" is.

show 2 replies
rcxdudelast Thursday at 11:48 PM

Unified Memory is mainly how consumer hardware has enough RAM accessible by the GPU to run larger models, because otherwise the market segmentation jacks up the price substantially.

show 1 reply