logoalt Hacker News

otabdeveloper4today at 6:35 AM3 repliesview on HN

MoE and such are basically performance enhancements, they don't make the model smarter.


Replies

fizxtoday at 8:59 PM

Performance enhancements are what allow you to train a bigger model.

yababa_ytoday at 7:23 AM

separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read...

jmalickitoday at 1:16 PM

Performance enhancements are huge though.

If you can make the existing model faster, you can then save your inference budget to then make your model bigger, which then makes it smarter.

A lot of how smart the models can be comes down to budget. If you can make your existing thing cheaper, you can instead make it bigger for the same price.

show 2 replies