logoalt Hacker News

dnhkngyesterday at 7:29 PM0 repliesview on HN

There are similar patterns in the models from all the big labs. I think the transform layer stack starts out 'undifferentiated', analogous to stem cells. Pre-training pushes the model to develop structure and this technique helps discover the hidden structure.