Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not ...

mfro • today at 1:16 PM • 1 reply • view on HN

Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data.

Replies

steve-atx-7600 • today at 1:23 PM

Inference from an LLM is O(tokens^2)

➕ show 1 reply

alt Hacker News

Replies