Maybe I’m lacking imagination. But how will a GPU with small-ish but fast VRAM and great compute, augment a Mac with large but slow VRAM and weak compute? The interconnect isn’t powerful enough to change layers on the GPU rapidly, I guess?
My Mini is actually the smallest model so it actually has "small but slow VRAM" (haha!) so the reason I want the GPU for are the smaller Gemmas or Qwens. Realistically, I'll probably run on an RTX 6000 Pro but this might be fun for home.
We've seen many recent projects to stream models direct from SSD to a discrete GPU's limited VRAM on PCs.
How big a bottleneck is Thunderbolt 5 compared to an SSD? Is the 120 Gbps mode only available when linked to a monitor?
> But how will a GPU with small-ish but fast VRAM and great compute, augment a Mac with large but slow VRAM and weak compute?
It would work just like a discrete GPU when doing CPU+GPU inference: you'd run a few shared layers on the discrete GPU and place the rest in unified memory. You'd want to minimize CPU/GPU transfers even more than usual, since a Thunderbolt connection only gives you equivalent throughput to PCIe 4.0 x4.