> Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with. You didn't need to write lengthy several page long Jira tickets. Just a brief paragraph and that's it.
With AI, you need to spell everything out in detail. But that's NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time. That's why every chat box has that "Regenerate" button. So your output with even a correct and detailed prompt might not lead to correct output. You're just literally rolling a dice with a random number generator.
Lastly - no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2. Same transformers with RL on top, same random seed, same list of probabilities of tokens and same temperature to select randomly one token to complete the output and feedback in again for the next token.
This is not true in my experience at all. I never write such detailed spec for AI - and that is my value as the human in the loop - to be iterative, to steer and make decisions. The AI in fact catches more edge cases than I do, and can point me to things that I never considered myself. Our productivity has increased manyfold, and code quality has increased significantly because writing tests is no longer a chore or an afterthought, or the biggest one for us - "test setup is too complicated". All of that is gone. And it is showing in a decrease in customer reported issues
> the underlying working principles are the same as GPT-2
I don't think anyone was claiming otherwise. Sonnet is still better at writing code than GPT-2, and worse than Opus. Workflows that work with Opus won't always work with Sonnet, just as you can't use GPT-2 in place of Sonnet to do code autocomplete.
> That's why every chat box has that "Regenerate" button.
Wait, are you doing this in the web chat interface?!
That's definitely not a good way. You need to be using a harness (like Claude Code) where the agent can plan its work, explore the codebase, execute code, run tests, etc. With this sort of set up, your prompts can be short (like 1 to 5 sentences) and still get great results.
> It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with.
I have no clue what AI you're using, but both Claude and Codex, you just explain the outcome, and they are pretty smart figuring out stuff on complex codebases.You don't even need a paragraph, just say "doing this I got an error".
> NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time.
So, exactly like humans. But a bit more predictable and way more reliable.
> That's why every chat box has that "Regenerate" button.
If you're using the chat box to write code, that's a human error, not an LLM one. Don't blame "AI" for your ignorance.
> no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2.
Sure. Every machine is a smoke machine if operated wrong enough. This tells me you should not get your insight from random YT videos. As a bit of nugget, some of the underlying working principles of the chat system also powered search engines; and their engineers also drank water, like hitler.