Earlier this week I started testing Chinese models on my codebase. I haven’t really looked at interactive coding yet, but more at issue triage, bug auto-fixing, log analytics, etc.
I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.
So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.
They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price
I personally really like DS4 Flash - it's the largest I can run locally with decent speeds and I feel like it's good enough to maintain a codebase with less effort
maybe i need to give it second chance, surprisingly Kimi 2.6 consistently fail even to generate valid json plan, where gemma 4 was doing really good, but slow.