I've been testing Ornith-1.0 35B (my own FP8-block quant) and I like it. It runs at >200 tok...

lhl • today at 12:59 PM • 1 reply • view on HN

I've been testing Ornith-1.0 35B (my own FP8-block quant) and I like it. It runs at >200 tok/s w/ vLLM on an RTX PRO 6000 (sm120), I've run >140M cached tokens of agentic coding work on it over the past few days. It seems to about somewhere between Qwen 3.6 35B-A3B and 27B, but the good thing: it overthinks/doom-loop a lot less than Qwen 3.6. When looking at the thinking traces I like its breakdown approach template.

It does good job on basic analysis, tasks, and some front-end/backend changes on a medium-sized Go codebase, but it reached its limits totally botching a longer (simple) kernel implementation job (about 100 iterations in Pi Agent harness) - this is the type of thing that stronger open models (Kimi K2.6, GLM 5.2) are able to do.

Replies

regularfry • today at 4:17 PM

With this model size I've found that the harness seems to matter more. I've moved on to little-coder rather than raw pi with qwen3.6 27b personally, it might be worth taking a look.

alt Hacker News

Replies