taps the sign
Unified Memory Is A Marketing Gimmeck. Industrial-Scale Inference Servers Do Not Use It.Industrial Scale Inference is moving towards LPDDR memory (alongside HBM), which is essentially what "Unified Memory" is.
Unified Memory is mainly how consumer hardware has enough RAM accessible by the GPU to run larger models, because otherwise the market segmentation jacks up the price substantially.
On M5 Pro/Max the memory is actually just attached straight to the GPU die. CPU accesses memory through the die-to-die bridge. I don't see the difference between that and a pure GPU from a memory connectivity point of view.
Wrt inference servers: sure, it's not cost-effective to have such a huge CPU die and a bunch of media accelerators on the GPU die if you just care about raw compute for inference and training. Apple SoCs are not tuned for that market, nor do they sell into it. I'm not building a datacentre, I'm trying to run inference on my home hardware that I also want to use for other things.