logoalt Hacker News

montrosertoday at 12:24 PM4 repliesview on HN

Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.


Replies

andixtoday at 12:28 PM

Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.

causaltoday at 2:28 PM

Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.

gowldtoday at 4:41 PM

Right. You can also perform RSA encryption on pencil and paper with a scientific calculator. It works, but it's not useful throughput for serious work

greenavocadotoday at 2:55 PM

I was about to ask that