On M5 Pro/Max the memory is actually just attached straight to the GPU die. CPU accesses memory through the die-to-die bridge. I don't see the difference between that and a pure GPU from a memory connectivity point of view.
Wrt inference servers: sure, it's not cost-effective to have such a huge CPU die and a bunch of media accelerators on the GPU die if you just care about raw compute for inference and training. Apple SoCs are not tuned for that market, nor do they sell into it. I'm not building a datacentre, I'm trying to run inference on my home hardware that I also want to use for other things.
If you're going to do unified memory, that's the way to do it, in addition to using higher-bandwidth RAM and padding out your GPGPU hardware. Nvidia realized this almost a decade ago, and Apple is being dragged through the mud to learn the exact same $4 trillion lesson.