Isn't that essentially how the MoE models already work? Besides, if that were infinitely scalab...

mmmllm • yesterday at 12:30 PM • 3 replies • view on HN

Isn't that essentially how the MoE models already work? Besides, if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost?

Besides, this would only apply for very few use cases. For a lot of basic customer care work, programming, quick research, I would say LLMs are already quite good without running it 100X.

Replies

mcrutcher • yesterday at 1:20 PM

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

amelius • yesterday at 1:19 PM

> if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost

The compute/intelligence curve is not a straight line. It's probably more a curve that saturates, at like 70% of human intelligence. More compute still means more intelligence. But you'll never reach 100% human intelligence. It saturates way below that.

➕ show 1 reply

mirekrusin • yesterday at 12:39 PM

MoE is something different - it's a technique to activate just a small subset of parameters during inference.

Whatever is good enough now, can be much better for the same cost (time, computation, actual cost). People will always choose better over worse.

➕ show 1 reply

alt Hacker News

Replies