logoalt Hacker News

antireztoday at 2:14 PM1 replyview on HN

Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.


Replies

simonwtoday at 6:02 PM

Hah, that's my fault for just using "Generate an SVG of a pelican riding a bicycle" as my test prompt!