Result is ~12 tokens per second, as reported by OP down in these comments here. An impressive effo...

montroser • today at 12:24 PM • 4 replies • view on HN

Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.

Replies

andix • today at 12:28 PM

Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.

causal • today at 2:28 PM

Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.

gowld • today at 4:41 PM

Right. You can also perform RSA encryption on pencil and paper with a scientific calculator. It works, but it's not useful throughput for serious work

greenavocado • today at 2:55 PM

I was about to ask that

alt Hacker News

Replies