logoalt Hacker News

WithinReasontoday at 3:16 PM2 repliesview on HN

Here is a paper that made a similar observation recently:

https://www.alphaxiv.org/abs/2512.19941


Replies

dnhkngtoday at 3:43 PM

Thanks for the link!

I think that these models have to learn to efficiently use their parameters, and the best way to do that is 'evolve' (yes, a bad word for it), structures over pretraining time. Unfortunately, they don't have a way to access these structures 'from the inside'. I hope this new approach lets up boost performance in s more experimentally rigorous way

show 1 reply
tgw43279wtoday at 3:24 PM

Very cool, thanks for sharing! Recovering 96% using just two blocks on IMN-1k, wow!