logoalt Hacker News

pseudonytoday at 7:32 AM1 replyview on HN

And what do you base this on ?

How does one objectively quantify how it stacks upnto another model ?

Or even, what is your subjective evaluation based on ?

I really wonder - because I have just finished a fully vibe-coded gtk/rust/lua application with me basically writing 7% of the code (all in one module) and GLM 5.1 writing the rest. We haven’t had regressions, confusion or anything else. And I am pretty damned sure I couldn’t manage this one year ago with claude code and Sonnet.


Replies

lejalvtoday at 1:05 PM

What harness, if you don't mind sharing?

show 1 reply