It comes down to trust. I was not able to trust GPT 4.1 or Sonnet 3.5 with anything other than short...

bensyverson • yesterday at 6:46 PM • 1 reply • view on HN

It comes down to trust. I was not able to trust GPT 4.1 or Sonnet 3.5 with anything other than short, well-specified tasks. If I let them go too long (e.g. in long Cursor sessions), it would lose the plot and start thrashing.

With better models and harnesses (e.g. Claude Code), I can now trust the AI more than I would trust a junior developer in the past.

I still review Claude's plans before it begins, and I try out its code after it finishes. I do catch errors on both ends, which is why I haven't taken myself out of the loop yet. But we're getting there.

Most of the time, the way I "verify" the code is behavioral: does it do what it's supposed to do? Have I tried sufficient edge cases during QA to pressure-test it? Do we have good test coverage to prevent regressions and check critical calculations? That's about as far as I ever took human code verification. If anything, I have more confidence in my codebases now.

Replies

cuntiusmccunt • yesterday at 10:07 PM

good moooorning sir

alt Hacker News

Replies