Yes 32B dense is a weird one to choose. But in reality, 32B dense is very similar* to 32B activate...

martinald • today at 12:00 AM • 1 reply • view on HN

Yes 32B dense is a weird one to choose.

But in reality, 32B dense is very similar* to 32B activated on MoE in terms of inference costs. And I highly suspect eg Opus is around that level of active params.

A 284ba13b model at scale, is almost certainly cheaper to serve than a 32b dense model.

*as you can shard the model across multiple GPUs at scale. but in reality you have some loss of efficiency from GPU coordination and expert routing

Replies

breput • today at 12:46 AM

That's good information. I couldn't possibly even start to run even DeepSeek Flash on my system, but also if you're assuming multiple GPUs, that is going to affect the napkin math.

➕ show 1 reply

alt Hacker News

Replies