logoalt Hacker News

steve-atx-7600today at 1:23 PM1 replyview on HN

Inference from an LLM is O(tokens^2)


Replies

halJordantoday at 3:53 PM

Only in the naive implementations of attention