> I don't think that AIs have become more trustworthy, the errors are just more subtle.
Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?
A difference to me is that when we manually write code, we reason about the code carefully with a purpose. Yes we do make mistakes, but the mistakes are grounded in a certain range. In contrast, AI generated code creates errors that do not follow common sense. That said, I don't feel this differentiation is strong enough, and I don't have data to back it up.
Humans can't make mistakes at the sheer scale that AI can.
Yes, as an engineer I make mistakes, but I could never make as many mistakes per day as an LLM can
> Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?
We're investing in the human getting better rather than paying $100 to Anthropic and hoping that's enough that they don't make the product worse.
This is like having a coworker who's as skilled as you if not more skilled, but also an alien.
Their mental model doesn't map cleanly enough to yours, and so where for a human you'd have some way to follow their thought patterns and identify mistakes, here the alien makes mistakes that don't add up.
Like the alien has encyclopedic knowledge of op codes in some esoteric soviet MCU but sometimes forgets how to look for a function definition, says "It looks like the read tool failed, that's ok, I can just make a mock implementation and comment out the test for now."
One answer, as another person pointed out, is that LLM mistakes are just different. They are less explicable, less predictable, and therefore harder to spot. I can easily anticipate how an inexperienced engineer is going to mess up their first pull request for my project. I have no idea what an LLM might do. Worse, I know it might ace the first fifty pull requests and then make an absolutely mind-boggling mistake in the 51st one.
But another answer is that human autonomy is coupled to responsibility. For most line employees, if they mess up badly enough, it's first and foremost their problem. They're getting a bad performance review, getting fired, end up in court or even in prison. Because you bear responsibility for your actions, your boss doesn't have to watch what you're up to 24x7. Their career is typically not on the line unless they're deeply complicit in your misbehavior.
LLMs have no meaningful responsibility, so whoever is operating them is ultimately on the hook for what they do. It's a different dynamic. It's probably why most software engineers are not gonna get replaced by robots - your director or VP doesn't want to be liable for an agent that goes haywire - but it's also why the "oh, I have an army of 50 YOLO agents do the work while I'm browsing Reddit" is probably not a wise strategy for line employees.