> Kind of orthogonal to the discussion, but could you broadly describe the code you're working on that both models are bad at?
Commonly, anything that hasn't already been done across 100 different projects on GitHub.
Making a React app with a CRUD backend: LLMs are great. They've been trained on this.
Doing new work on complex non-public codebases or in niche problems that aren't commonly solved: Completely different story. Some times they'll find enough information to piece together a path toward a solution, but that doesn't mean it's a good solution. I also have to feed in a lot more context and even stop them when they go down bad paths frequently.
For the complex work I don't have the LLMs write code, but I may have them do a proof of concept. I have to write and understand everything myself. There are times when I'll think the LLM output looks good until I go through it line by line and realize it's done something completely unnecessary, or happened to get the right result for the wrong reasons. For unknown problems they're good at getting something to work through brute force if you let them consume enough tokens, but it may rely on safety fallbacks from the OS or fallbacks instead of being a proper solution. I always chuckle when they encounter intermittent errors and the first idea is to add a retry mechanism so the error is ignored.