logoalt Hacker News

Havoctoday at 5:05 PM1 replyview on HN

Quite a niche release. The MoE outperforms it on score and will likely be faster thanks to lower active weights. So this really only makes sense for specific ram constrained applications that can’t fit a quantized MoE


Replies

dist-epochtoday at 5:10 PM

The un-quantized MoE outperforms it.

But between same (V)RAM requirement 4 bit 26B-A3B and 8 bit 12B it's unclear which one will win, especially given one is MoE and the other dense.

All the launch benchmarks are at 16 bit.