logoalt Hacker News

5o1ecistyesterday at 9:42 AM4 repliesview on HN

> I doubt frontier models have actually substantially grown in size in the last 1.5 years

... and you'd be most likely very correct with your doubt, given the evidence we have.

What improved disproportionally more than the software- or hardware-side, is density[1]/parameter, indicating that there's a "Moore's Law"-esque behind the amount of parameters, the density/parameter and compute-requirements. As long as more and more information/abilities can be squeezed into the same amount of parameters, inference will become cheaper and cheaper quicker and quicker.

I write "quicker and quicker", because next to improvements in density there will still be additional architectural-, software- and hardware-improvements. It's almost as if it's going exponential and we're heading for a so called Singularity.

Since it's far more efficient and "intelligent" to have many small models competing with and correcting each other for the best possible answer, in parallel, there simply is no need for giant, inefficient, monolithic monsters.

They ain't gonna tell us that, though, because then we'd know that we don't need them anymore.

[1] for lack of a better term that I am not aware of.


Replies

red75primeyesterday at 2:35 PM

Obviously, there’s a limit to how much you can squeeze into a single parameter. I guess the low-hanging fruit will be picked up soon, and scaling will continue with algorithmic improvements in training, like [1], to keep the training compute feasible.

I take "you can't have human-level intelligence without roughly the same number of parameters (hundreds of trillions)" as a null hypothesis: true until proven otherwise.

[1] https://arxiv.org/html/2602.15322v1

stavrosyesterday at 10:01 AM

Why don't we need them? If I need to run a hundred small models to get a given level of quality, what's the difference to me between that and running one large model?

show 2 replies
somewhereoutthyesterday at 10:10 AM

I'd suggest that a measure like 'density[1]/parameter' as you put it will asymptotically rise to a hard theoretical limit (that probably isn't much higher than what we have already). So quite unlike Moore's Law.

show 1 reply
block_daggeryesterday at 10:20 AM

Doesn’t the widely accepted Bitter Lesson say the exact opposite about specialized models vs generalized?

show 2 replies