logoalt Hacker News

naaskingtoday at 3:10 PM1 replyview on HN

This layer duplication strikes me as a bit of "poor man's" version of looped language models:

https://ouro-llm.github.io/

Pretty cool though. LLM brain surgery.


Replies

dnhkngtoday at 3:51 PM

Agrees, but one thing to note:

I really think from the experiments that 'organs' (not sure what to term this), develop during massive pretraining. This also means maybe looping the entire models is actually not efficient. Maybe a better way is [linear input section -> loop 1 -> linear section -> loop 2 -> linear section -> ... -> loop n -> linear output]?

This would give 'organs' space to develop.

show 1 reply