Looks like it's about a year behind. Not that I am complaining. A year behind is good progress.
I also feel much of the trick is in the reasoning and harness.
so some progress around that would accelerate this process.
And what do you base this on ?
How does one objectively quantify how it stacks upnto another model ?
Or even, what is your subjective evaluation based on ?
I really wonder - because I have just finished a fully vibe-coded gtk/rust/lua application with me basically writing 7% of the code (all in one module) and GLM 5.1 writing the rest. We haven’t had regressions, confusion or anything else. And I am pretty damned sure I couldn’t manage this one year ago with claude code and Sonnet.
Harness certainly matters a lot, though GLM is pretty forgiving. I just had Opus tell me that based on numbers over the last week, from quite a few billion tokens total across half a dozen providers, GLM 5.1 has been more reliable for one of my projects than Sonnet... Just switching on 5.2 now.