Ya, while the tools are really solid and have seen huge leaps these past two years, in no way will an LLM be able to do any of it unguided in two years. Just a humble opinion that I would love to see be wrong.
There's a huge difference between one shot and few shot versus building a robust harness with deterministic and adversarial quality gates. And I'm finding that agents can actually do a pretty good job of a surprising number of things if you are very clear about your dimensions of quality and the rubrics that you get agents to research and then use to validate against those dimensions of quality.
Make sure to use a deterministic pipeline or harness to go step by step so agents aren't checking their own work and I sometimes get alpha from having a codex check the work of a clod but I am seeing pretty good output across multiple domains when I have three independent quality gates and a loop which only spits it out to a human if it doesn't converge at a reasonable cost.
> Just a humble opinion that I would love to see be wrong
Out of curiosity, why would you love to be wrong about that? What possible outcome could you see being a net positive for society if the vast majority of knowledge workers (and ultimately, as robotics progress, most workers in general) are replaced by AI?
Yeah it can do things unguided if the tests to confirm its correctness are very solid. Thats where a lot of progress has been made and where agents are good, but this is domain specific, and a chance where startups can shine.
"in no way will an LLM be able to do any of it unguided in two years"
IDK "not any of it" seems a bit strong, especially thinking towards 2028. For a lot of knowledge professions, there is a surprising amount of tasks that are just dumb work compared to the rest.