A Qwen3.6-35B-A3B or whatever it's full name is, when on a 3090, can at the very least, with very little fine tuning, compete with Haiku and blows away GPT4.1 (aka, the cheap models).
It might keep up with Sonnet 4.5 with some tinkering.
But long story short: it seems to have better performance and similar quality for a payoff of a year or so compared to cloud models. In the same way you can self host faster/easier/cheaper than cloud hosting, if you are okay with the negatives.
I'm returning my 3090 soon for a R9700 after some more basic benchmarking, since the higher RAM should improve my observations more.
A Qwen3.6-35B-A3B or whatever it's full name is, when on a 3090, can at the very least, with very little fine tuning, compete with Haiku and blows away GPT4.1 (aka, the cheap models).
It might keep up with Sonnet 4.5 with some tinkering.
But long story short: it seems to have better performance and similar quality for a payoff of a year or so compared to cloud models. In the same way you can self host faster/easier/cheaper than cloud hosting, if you are okay with the negatives.
I'm returning my 3090 soon for a R9700 after some more basic benchmarking, since the higher RAM should improve my observations more.