There are similar patterns in the models from all the big labs. I think the transform layer stack s...

dnhkng • yesterday at 7:29 PM • 0 replies • view on HN

There are similar patterns in the models from all the big labs. I think the transform layer stack starts out 'undifferentiated', analogous to stem cells. Pre-training pushes the model to develop structure and this technique helps discover the hidden structure.

alt Hacker News