Rules of thumb:
The more your toolchain (compilers, linters, etc) can statically verify, the better agents will do.
The terser the code, the better agents will do.
The more often similar problems have been solved in open source, the better agents will do. Agents seem particularly good at plumbing together different pieces of software.
Anything that requires a judgement call, as opposed to having one obvious way to do it, will get worse results from an agent.
As the scope of the request grows, agents get worse at it. This can be mitigated somewhat using various techniques ("write a plan", "do step 1 of the plan", etc) but never fully resolved. At some point the task is so big that it's necessary to do large parts by hand.