logoalt Hacker News

ekianjotoday at 9:49 AM2 repliesview on HN

20 tokens per second for eval time is the killer here. It means you can't use this to process any meaningful amount of text.

A GPU typically processes close to 1000 tokens/s during eval.


Replies

hnfongtoday at 1:39 PM

The prompt is literally "why is the sky blue?" and consists of 7 tokens.

It's probably too small for the timings to be taken seriously.

boutelltoday at 10:47 AM

I'm pretty sure eval time is token generation time where it's actually outputting new tokens. If you're getting a thousand per second on that, I'd love to know on what.

show 2 replies