Given how many "fundamental" limitations of AI have been resolved within the past few years, I'm skeptical. Even if you're right, I am not sure that the limitations you identified matter all that much in practice. I think very few human engineers are working on problems which are so novel and unique that AIs cannot grasp them without additional reinforcement learning.
> it will delete all the files in "X/"
How many "I deleted the prod database" stories have you seen? Humans do this too.
> follow arbitrary instructions from an attacker found in random documents
This is just the AI equivalent of phishing - inability to distinguish authorized from unauthorized requests.
Whenever people start criticizing AI, they always seem to conveniently leave out all the stupid crap humans do and compare AI against an idealized human instead.
> How many "I deleted the prod database" stories have you seen? Humans do this too.
Humans do it accidentally.
Sorry, but you're mistaking outputs with process. If you actually know what models are doing under the hood to product output that (admittedly) looks very convincing, you'll quickly realize that they are simply exceptionally good at statistically predicting the next token in a stream of tokens. The reason you are having to become an expert at context engineering, and the reason the labs still hire engineers, is because turning next token prediction into something that can simulate general intelligence isn't easy.
The boundaries of these systems is very easy to find, though. Try to play any kind of game with them that isn't a prediction game, or perhaps even some that are (try to play chess with an LLM, it's amusing).
Well, llms are way more stupid, doing things that even most juniors wouldn't do (and then you don't give PROD access to new junior hire, do you... most people are super careful with llms and simply don't trust them and don't let them anywhere near critical infra or data - thats seniority 101).
Which fundamental limitation do you mean? I haven't seen anything but slow, iterative improvements. Sure, if feels fine, turtle can eventually do 10,000 mile trek but just because its moving left and right feet and decreasing the distance doesn't mean its getting there anytime soon.
Parent mentioned way harder hurdles than iterative increments can tackle, rather radical new... everything.
> How many "I deleted the prod database" stories have you seen? Humans do this too.
Humans generally do it on accident. They don't preface it with "Let me delete the production database," which LLMs do.
> Given how many "fundamental" limitations of AI have been resolved within the past few years
Eh? Which limitations were solved?
> How many "I deleted the prod database" stories have you seen?
If you've used the latest models extensively, you must've noticed times when AI 'runs out of common sense' and keeps trying stupid stuff.
I'm somewhat convinced that the amazing (and improving!) coding ability of these LLMs comes from it being RLHFd on the conversations its having with programmers, with each successfully resolved bug, implemented feature ending up in training data.
Thus we are involuntarily building the world's biggest stackoverflow.
Which for the record is incredibly useful, and may even put most programmers out of a job (who I think at that point should feel a bit stupid for letting this happen), but its not necessarily AGI.