logoalt Hacker News

CamperBob2last Wednesday at 11:55 PM3 repliesview on HN

Not the GP but the new Qwen-Coder-Next release feels like a step change, at 60 tokens per second on a single 96GB Blackwell. And that's at full 8-bit quantization and 256K context, which I wasn't sure was going to work at all.

It is probably enough to handle a lot of what people use the big-3 closed models for. Somewhat slower and somewhat dumber, granted, but still extraordinarily capable. It punches way above its weight class for an 80B model.


Replies

redwood_yesterday at 12:04 AM

Agree, these new models are a game changer. I switched from Claude to Qwen3-Coder-Next for day-to-day on dev projects and don't see a big difference. Just use Claude when I need comprehensive planning or review. Running Qwen3-Coder-Next-Q8 with 256K context.

paxysyesterday at 3:13 AM

"Single 96GB Blackwell" is still $15K+ worth of hardware. You'd have to use it at full capacity for 5-10 years to break even when compared to "Max" plans from OpenAI/Anthropic/Google. And you'd still get nowhere near the quality of something like Opus. Yes there are plenty of valid arguments in favor of self hosting, but at the moment value simply isn't one of them.

show 3 replies
zozbot234yesterday at 12:01 AM

IIRC, that new Qwen model has 3B active parameters so it's going to run well enough even on far less than 96GB VRAM. (Though more VRAM may of course help wrt. enabling the full available context length.) Very impressive work from the Qwen folks.