logoalt Hacker News

zozbot234yesterday at 8:17 AM1 replyview on HN

SOTA models are reportedly MoE, not dense.


Replies

bigyabaiyesterday at 5:37 PM

A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.

show 1 reply