The biggest roadblock to using agents to maximum effectiveness like this is the chat interface. It's convenience as detriment and convenience as distraction. I've found myself repeatedly giving into that convenience only to realize that I have wasted an hour and need to start over because the agent is just obliviously circling the solution that I thought was fully obvious from the context I gave it. Clearly these tools are exceptional at transforming inputs into outputs and, counterintuitively, not as exceptional when the inputs are constantly interleaved with the outputs like they are in chat mode.
> I thought was fully obvious from the context I gave it.
Lot of people think they have given the right instructions but in most cases people miss some crucial points and that leads the model in the wrong direction, then the same people complain AI is not good.