logoalt Hacker News

flumpcakestoday at 12:19 AM2 repliesview on HN

How does Pi+Qwen (local) compare to Anthropic's offerings? Surely you're not getting the same breadth and quality of output using Qwen? How is the performance?


Replies

skeledrewtoday at 1:48 AM

So far I've only really set things up and done some benchmarking (a set of capability prompts created and evaluated by Claude, HumanEval and MBPP; haven't completed the latter 2) on several local models (Qwen 1.7b, 4b, 9b & 35b a3b; 1.7b got 6/8 correct at ~14.7 tok/s on the capability set, to 35b for 8/8 at ~4.5 tok/s; can share full results if interested), and setup llama-swap so I can dynamically select them. I'll need to decide which of my projects I'll be really testing them on, with the awareness that I'll have to be even more involved.

dymktoday at 3:21 AM

It’s a toy compared to Opus or Sonnet. Obviously the 5 trillion parameter models running on $$$$ hardware is going to outperform a local model.