logoalt Hacker News

Lerctoday at 6:35 PM2 repliesview on HN

I have had broadly the same intuitions on the use of middle layers, but haven't had much luck with the tiny models that I can run on my hardware.

There's a video on YouTube https://www.youtube.com/watch?v=pDsTcrRVNc0

about a looping layer models, after watching that I poured some thoughts off the top of my head into a comment which, of course, promptly sunk without a trace. I'll repost the gist of them here.

If you gain benefit from looping layers, at some level every layer of parameters is in front of and behind every other, the conclusion must be that the order of the layers does not need to be fixed at all.

If you cycle through the layers multiple times, are you doing so for the benefit of a particular layer on a particular problem. If so, can you skip the other layers that don't add on repetition. If you can skip (and you can know when to skip), and you can repeat (and know when to repeat)

What you would need is a mechanism which can decide which layer is needed next. Is that then not a looping single layer MOE model? Storing the layers as a wide set of selectable options rather than a deep set of unconditional layers. You would be picking what the next layer should be (or exit the loop) the threshold for exit drops each iteration so it always eventually exits. With a tunable 'how hard to think' knob to adjust the threshold.


Replies

janalsncmtoday at 8:09 PM

That is an interesting idea. I suspect if we relax the constraint that most of the layers in a loop will be in order, there is a combinatorial explosion issue.

But we could still try it out: randomize the order we call the transformer blocks, and see if it affects performance. If not, that’s extremely interesting.