A good technical project, but honestly useless in like 90% of scenarios.
You want to use an NVidia GPU for LLM ? just buy a basic PC on second hand (the GPU is the primary cost anyway), you want to use Mac for good amount of VRAM ? Buy a Mac.
With this proposed solution you have an half-backed system, the GPU is limited by the Thunderbolt port and you don’t have access to all of NVidia tool and library, and on other hand you have a system who doesn’t have the integration of native solution like MLX and a risk of breakage in future macOS update.
There's a third option that might fit some of the "I'm on a Mac but need CUDA" cases: network-mounting an Nvidia GPU from another machine on the same LAN. The GPU stays wherever it lives (office server, lab machine, a roommate's PC), your Mac runs the CUDA workload locally without any code changes — same PyTorch/CUDA calls, just intercepted by a stub library that forwards them over the local network.
The tradeoff vs. a physical eGPU: no Thunderbolt bandwidth ceiling or cabling, but you do need to be on the same LAN and there's ~4% overhead vs. native. Doesn't help if you need the GPU while traveling, and won't fix the physical macOS driver situation for native GPU access.
Disclosure: I work on GPU Go (tensor-fusion.ai/products/gpu-go), so I'm obviously biased toward this approach — but it genuinely is a different point in the design space from eGPU.
I misunderstood eGPU for virtual GPU. But I was wrong it means external GPU.
Chicken/egg. NVidia tooling is lacking surely in part because the hardware wasn’t usable on macOS until now. Now that it’s usable that might change.