I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation....

xienze • yesterday at 11:34 PM • 3 replies • view on HN

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

Replies

fgfarben • today at 1:00 AM

That prefill number isn't right. M4 Max hits 200-300: https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...

➕ show 1 reply

throwdbaaway • today at 7:04 AM

Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.

aiscoming • yesterday at 11:37 PM

if it's just the coding agent system prompt and tools, you can cache that

➕ show 1 reply

alt Hacker News

Replies