A Qwen3.6-35B-A3B or whatever it's full name is, when on a 3090, can at the very least, with very little fine tuning, compete with Haiku and blows away GPT4.1 (aka, the cheap models).
It might keep up with Sonnet 4.5 with some tinkering.
But long story short: it seems to have better performance and similar quality for a payoff of a year or so compared to cloud models. In the same way you can self host faster/easier/cheaper than cloud hosting, if you are okay with the negatives.
I'm returning my 3090 soon for a R9700 after some more basic benchmarking, since the higher RAM should improve my observations more.
> It might keep up with Sonnet 4.5 with some tinkering.
I would love to see that. I've been using Qwen3.6 35B and the dense 27B, and they are both too slow with not such great results for agentic coding tasks. It's ok, but not impressive. I had better luck with the BF16 and Q8 than the Q4 from unsloth (really love what unsloth is doing in this space). Another problem I had with Qwen, which I did not ever encounter with Sonnet - even the BF16 gets stuck and needs a "continue task" prompt from time to time, the lower quants are even worse in that regard.
If you get some interesting results, I would love to read about it!