The other realistic setup is $20k, for a small company that needs a private AI for coding or other i...

bertili • today at 10:04 AM • 3 replies • view on HN

The other realistic setup is $20k, for a small company that needs a private AI for coding or other internal agentic use with two Mac Studios connected over thunderbolt 5 RMDA.

Replies

Barathkanna • today at 10:14 AM

That won’t realistically work for this model. Even with only ~32B active params, a 1T-scale MoE still needs the full expert set available for fast routing, which means hundreds of GB to TBs of weights resident. Mac Studios don’t share unified memory across machines, Thunderbolt isn’t remotely comparable to NVLink for expert exchange, and bandwidth becomes the bottleneck immediately. You could maybe load fragments experimentally, but inference would be impractically slow and brittle. It’s a very different class of workload than private coding models.

➕ show 4 replies

embedding-shape • today at 10:12 AM

I'd love to see the prompt processing speed difference between 16× H100 and 2× Mac Studio.

➕ show 2 replies

zozbot234 • today at 10:12 AM

That's great for affordable local use but it'll be slow: even with the proper multi-node inference setup, the thunderbolt link will be a comparative bottleneck.

alt Hacker News

Replies