Curious what would be the most minimal reasonable hardware one would need to deploy this locally?

zmmmmm • today at 7:26 AM • 3 replies • view on HN

Replies

I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching.

You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn).

simonw • today at 10:07 AM

Models of this size can usually be run using MLX on a pair of 512GB Mac Studio M3 Ultras, which are about $10,000 each so $20,000 for the pair.

➕ show 1 reply

tosh • today at 10:15 AM

I think you can put a bunch of apple silicon macs with enough ram together

e.g. in an office or coworking space

800-1000 gb ram perhaps?

alt Hacker News

Replies