logoalt Hacker News

Barathkannatoday at 9:55 AM4 repliesview on HN

A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.


Replies

wongarsutoday at 11:40 AM

The weights are int4, so you'd only need 8xH100

a2128today at 1:05 PM

You don't need to wait and see, Kimi K2 has the same hardware requirements and has several providers on OpenRouter:

https://openrouter.ai/moonshotai/kimi-k2-thinking https://openrouter.ai/moonshotai/kimi-k2-0905 https://openrouter.ai/moonshotai/kimi-k2-0905:exacto https://openrouter.ai/moonshotai/kimi-k2

Generally it seems to be in the neighborhood of $0.50/1M for input and $2.50/1M for output

reissbakertoday at 10:37 AM

Generally speaking, 8xH200s will be a lot cheaper than 16xH100s, and faster too. But both should technically work.

show 1 reply
bertilitoday at 10:04 AM

The other realistic setup is $20k, for a small company that needs a private AI for coding or other internal agentic use with two Mac Studios connected over thunderbolt 5 RMDA.

show 3 replies