logoalt Hacker News

mmmllmyesterday at 12:30 PM3 repliesview on HN

Isn't that essentially how the MoE models already work? Besides, if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost?

Besides, this would only apply for very few use cases. For a lot of basic customer care work, programming, quick research, I would say LLMs are already quite good without running it 100X.


Replies

mcrutcheryesterday at 1:20 PM

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

ameliusyesterday at 1:19 PM

> if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost

The compute/intelligence curve is not a straight line. It's probably more a curve that saturates, at like 70% of human intelligence. More compute still means more intelligence. But you'll never reach 100% human intelligence. It saturates way below that.

show 1 reply
mirekrusinyesterday at 12:39 PM

MoE is something different - it's a technique to activate just a small subset of parameters during inference.

Whatever is good enough now, can be much better for the same cost (time, computation, actual cost). People will always choose better over worse.

show 1 reply