So far I've only really set things up and done some benchmarking (a set of capability prompts ...

skeledrew • today at 1:48 AM • 0 replies • view on HN

So far I've only really set things up and done some benchmarking (a set of capability prompts created and evaluated by Claude, HumanEval and MBPP; haven't completed the latter 2) on several local models (Qwen 1.7b, 4b, 9b & 35b a3b; 1.7b got 6/8 correct at ~14.7 tok/s on the capability set, to 35b for 8/8 at ~4.5 tok/s; can share full results if interested), and setup llama-swap so I can dynamically select them. I'll need to decide which of my projects I'll be really testing them on, with the awareness that I'll have to be even more involved.

alt Hacker News