logoalt Hacker News

redox99today at 9:06 AM1 replyview on HN

In the 90s you didn't have norm layers, residuals, attention, and some more.

So you're missing a lot of the building blocks that make LLMs. It's not a matter of just having the compute.


Replies

sirsinsalottoday at 10:24 AM

I think the attention mechanism is so simple but so revolutionary that people forget it.

Like the best leaps in thinking, once it is made, is is immediately obvious and intuitive.

show 1 reply