Result is ~12 tokens per second, as reported by OP down in these comments here.
An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.
Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.
Right. You can also perform RSA encryption on pencil and paper with a scientific calculator. It works, but it's not useful throughput for serious work
I was about to ask that
Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.