logoalt Hacker News

rishabhaiovertoday at 12:23 AM1 replyview on HN

I used to think it was the quadratic complexity of attention but I guess that's not a concern anymore as they've made more hardware aware kernels of attention? The other I remember is continual learning but that may be solved in near-term future. I am not completely confident about it.


Replies

ACCount37today at 12:51 AM

Humans do have an upper limit on how much working memory they have. Which I see as the closest thing to the "O(N^2) attention curse" of LLMs.

That doesn't stop an LLM from manipulating its context window to take full advantage of however much context capacity it has. Today's tools like file search and context compression are crude versions of that.

show 1 reply