> look at say Claude sonnet 3.x. It’s an entire world away in like a year
In the area I work I find them to be of very little value both then and now... I see no real difference. They help in marginal tasks. Eg. they catch typos, or they help new programmers to faster explore the existing codebase.
So far, I haven't used a single line of code generated by AI, even though I've seen thousands. Some of them worked to draw attention to a problem, but none solved it successfully. It was all pretty lame.
I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better". No reason to believe it will...
But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work. It doesn't make decisions about the important parts. Like, in the example above, the AI is not capable of choosing a better strategy: use pyproject.toml or write code to build Python packages? It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer, only the future will prove one or the other to be the right call.
> So far, I haven't used a single line of code generated by AI, even though I've seen thousands. Some of them worked to draw attention to a problem, but none solved it successfully. It was all pretty lame.
I find this statement highly suspect. AI coding agents nowadays can spot subtle object lifetime management issues and even dependency lifecycle incompatibilities, and here you are stating you are unable to use them to fix things? How strange.
Not to mention that coding agents excel at creating greenfield projects and migrating whole frameworks.
But if you feel you can't use them then I feel sorry for you.
I think if you honestly don’t believe there is a major difference between 3.x and 4.7 I don’t think there is much anyone will be able to do to convince you. I do find it disappointing when technical professionals are so disinterested in building a real understanding of a fairly complex topic.
> I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better".
That’s a real bummer to read that from someone who sounds like a professional, and not only a professional but someone thoughtful and smart. 30 years of brilliant work in RL, Bayesian stats, machine learning, measurement, and then trillions of dollars of funding and some of the best talent in the world, and your assertion is “I tried it on my codebase and I didn’t like it and that trumps literally entire fields of mathematics and statistics”. I mean, have you heard of Chinchilla scaling laws? Do you know how RL works? are you aware of benchmarks, their strengths and weaknesses? Are you following adoption numbers, accomplishments like new proofs of unsolved erdos problems?
> But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work.
Your experiences are your experiences, I don’t know what work you do or how it gets done, what languages you’re working with etc. but literally we’re at the point where the vast majority of code at major tech companies is fully AI written (not assisted).
> It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer
What are you claiming is not fundamentally possible for an AI to do that a human can do here? People make judgement calls on ambiguous problems, taking into account vast amounts of context about the business, dev time, reliability, maintenance, etc; why do you think AI can’t do that?