Even ignoring superior caching on a local setup, Mac hardware can often process input token around 1...

amluto • yesterday at 4:48 PM • 1 reply • view on HN

Even ignoring superior caching on a local setup, Mac hardware can often process input token around 10x as quickly as they produce output tokens. Openrouter seems to have only a 2x difference on the same models.

Replies

bigyabai • yesterday at 9:26 PM

For larger contexts (eg. 20,000+ token agent workflows), being 10x faster still isn't enough. You have to be close to ~100x faster at crunching contexts for it to feel like realtime.

alt Hacker News

Replies