Such a great comment, and I agree with all of them.
For me in a similar vein:
- mar ‘24: thinking about how to survey the field and implement a hard research task in Natural Language Processing, and then just approximating it well enough with a prompt and a completions api
- mid ‘25: Llama 3 being able to analyze a good sized codebase I was onboarding onto, and synthesize it into diagrams that matched the quality of ones I’d generated by hand with deterministic tools.
- dec ‘25: opus 4.5 basically generating multi-class modules and tests perfectly (syntactically). Finding that errors were my own under-specification of the prompt. Stopped writing code by hand, mainly because it was good enough and came with tests, docs, build scripts, and other goodies for free.