The “inline comments on a plan” is one of the best features of Antigravity, and I’m surprised others haven’t started copycatting.
Interesting! I feel like I'm learning to code all over again! I've only been using Claude for a little more than a month and until now I've been figuring things out on my own. Building my methodology from scratch. This is much more advanced than what I'm doing. I've been going straight to implementation, but doing one very small and limited feature at a time, describing implementation details (data structures like this, use that API here, import this library etc) verifying it manually, and having Claude fix things I don't like. I had just started getting annoyed that it would make the same (or very similar) mistake over and over again and I would have to fix it every time. This seems like it'll solve that problem I had only just identified! Neat!
Try OpenSpec and it'll do all this for you. SpecKit works too. I don't think there's a need to reinvent the wheel on this one, as this is spec-driven development.
Haha this is surprisingly and exactly how I use claude as well. Quite fascinating that we independently discovered the same workflow.
I maintain two directories: "docs/proposals" (for the research md files) and "docs/plans" (for the planning md files). For complex research files, I typically break them down into multiple planning md files so claude can implement one at a time.
A small difference in my workflow is that I use subagents during implementation to avoid context from filling up quickly.
This is similar to what I do. I instruct an Architect mode with a set of rules related to phased implementation and detailed code artifacts output to a report.md file. After a couple of rounds of review and usually some responses that either tie together behaviors across context, critique poor choices or correct assumptions, there is a piece of work defined for a coder LLM to perform. With the new Opus 4.6 I then select specialist agents to review the report.md, prompted with detailed insight into particular areas of the software. The feedback from these specialist agent reviews is often very good and sometimes catches things I had missed. Once all of this is done, I let the agent make the changes and move onto doing something else. I typically rename and commit the report.md files which can be useful as an alternative to git diff / commit messages etc.
This looks like an important post. What makes it special is that it operationalizes Polya's classic problem-solving recipe for the age of AI-assisted coding.
1. Understand the problem (research.md)
2. Make a plan (plan.md)
3. Execute the plan
4. Look back
I've been running AI coding workshops for engineers transitioning from traditional development, and the research phase is consistently the part people skip — and the part that makes or breaks everything.
The failure mode the author describes (implementations that work in isolation but break the surrounding system) is exactly what I see in workshop after workshop. Engineers prompt the LLM with "add pagination to the list endpoint" and get working code that ignores the existing query builder patterns, duplicates filtering logic, or misses the caching layer entirely.
What I tell people: the research.md isn't busywork, it's your verification that the LLM actually understands the system it's about to modify. If you can't confirm the research is accurate, you have no business trusting the plan.
One thing I'd add to the author's workflow: I've found it helpful to have the LLM explicitly list what it does NOT know or is uncertain about after the research phase. This surfaces blind spots before they become bugs buried three abstraction layers deep.
The biggest roadblock to using agents to maximum effectiveness like this is the chat interface. It's convenience as detriment and convenience as distraction. I've found myself repeatedly giving into that convenience only to realize that I have wasted an hour and need to start over because the agent is just obliviously circling the solution that I thought was fully obvious from the context I gave it. Clearly these tools are exceptional at transforming inputs into outputs and, counterintuitively, not as exceptional when the inputs are constantly interleaved with the outputs like they are in chat mode.
I’ve been using Claude through opencode, and I figured this was just how it does it. I figured everyone else did it this way as well. I guess not!
The separation of planning and execution resonates strongly. I've been using a similar pattern when building with AI APIs — write the spec/plan in natural language first, then let the model execute against it.
One addition that's worked well for me: keeping a persistent context file that the model reads at the start of each session. Instead of re-explaining the project every time, you maintain a living document of decisions, constraints, and current state. Turns each session into a continuation rather than a cold start.
The biggest productivity gain isn't in the code generation itself — it's in reducing the re-orientation overhead between sessions.
In my own tests I have found opus to be very good at writing plans, terrible at executing them. It typically ignores half of the constraints. https://x.com/xundecidability/status/2019794391338987906?s=2... https://x.com/xundecidability/status/2024210197959627048?s=2...
I don't deny that AI has use cases, but boy - the workflow described is boring:
"Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. "
Does anyone think this is as epic as, say, watch the Unix archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian demos how pipes work; or Dennis working on C and UNIX? Or even before those, the older machines?
I am not at all saying that AI tools are all useless, but there is no real epicness. It is just autogenerated AI slop and blob. I don't really call this engineering (although I also do agree, that it is engineering still; I just don't like using the same word here).
> never let Claude write code until you’ve reviewed and approved a written plan.
So the junior-dev analogy is quite apt here.
I tried to read the rest of the article, but I just got angrier. I never had that feeling watching oldschool legends, though perhaps some of their work may be boring, but this AI-generated code ... that's just some mythical random-guessing work. And none of that is "intelligent", even if it may appear to work, may work to some extent too. This is a simulation of intelligence. If it works very well, why would any software engineer still be required? Supervising would only be necessary if AI produces slop.
Sounds similar to Kiro's specs.
Every "how I use Claude Code" post will get into the HN frontpage.
Which maybe has to do with people wanting to show how they use Claude Code in the comments!
I’m a big fan of having the model create a GitHub issue directly (using the GH CLI) with the exact plan it generates, instead of creating a markdown file that will eventually get deleted. It gives me a permanent record and makes it easy to reference and close the issue once the PR is ready.
Interesting approach. The separation of planning and execution is crucial, but I think there's a missing layer most people overlook: permission boundaries between the two phases.
Right now when Claude Code (or any agent) executes a plan, it typically has the same broad permissions for every step. But ideally, each execution step should only have access to the specific tools and files it needs — least privilege, applied to AI workflows.
I've been experimenting with declarative permission manifests for agent tasks. Instead of giving the agent blanket access, you define upfront what each skill can read, write, and execute. Makes the planning phase more constrained but the execution phase much safer.
Anyone else thinking about this from a security-first angle?
Does anyone still write code? I use agents to iterate on one task in parallel, with an approach similar to this one: https://mitchellh.com/writing/my-ai-adoption-journey#today
But I'm starting to have an identity crisis: am I doing it wrong, and should I use an agent to write any line of code of the product I'm working on?
Have I become a dinosaur in the blink of an eye?
Should I just let it go and accept that the job I was used to not only changed (which is fine), but now requires just driving the output of a machine, with no creative process at all?
Good article, but I would rephrase the core principle slightly:
Never let Claude write code until you’ve reviewed, *fully understood* and approved a written plan.
In my experience, the beginning of chaos is the point at which you trust that Claude has understood everything correctly and claims to present the very best solution. At that point, you leave the driver's seat.
I came to the exact same pattern, with one extra heuristic at the end: spin up a new claude instance after the implementation is complete and ask it to find discrepancies between the plan and the implementation.
My flow is pretty similar, except I also add in these steps at the end of planning:
* Review the plan for potential issues
* Add context to the plan that would be helpful for an implementing agent
The baffling part of the article is all the assertions about how this is unique, novel, not the typical way people are doing this etc.
There are whole products wrapped around this common workflow already (like Augment Intent).
It strikes me that if this technology were as useful and all-encompassing as it's marketed to be, we wouldn't need four articles like this every week
I just use Jesse’s “superpowers” plugin. It does all of this but also steps you through the design and gives you bite sized chunks and you make architecture decisions along the way. Far better than making big changes to an already established plan.
Gemini is better at research Claude at coding. I try to use Gemini to do all the research and write out instruction on what to do what process to follow then use it in Claude. Though I am mostly creating small python scripts
Insights are nice for new users but I’m not seeing anything too different from how anyone experienced with Claude Code would use plan mode. You can reject plans with feedback directly in the CLI.
Google Anti-Gravity has this process built in. This is essentially a cycle a developer would follow: plan/analyse - document/discuss - break down tasks/implement. We’ve been using requirements and design documents as best practice since leaving our teenage bedroom lab for the professional world. I suppose this could be seen as our coding agents coming of age.
My process is similar, but I recently added a new "critique the plan" feedback loop that is yielding good results. Steps:
1. Spec
2. Plan
3. Read the plan & tell it to fix its bad ideas.
4. (NB) Critique the plan (loop) & write a detailed report
5. Update the plan
6. Review and check the plan
7. Implement plan
Detailed here:
This is a similar workflow to speckit, kiro, gsd, etc.
I use amazon kiro.
The AI first works with you to write requirements, then it produces a design, then a task list.
The helps the AI to make smaller chunks to work on, it will work on one task at a time.
I can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.
Kiro also supports steering files, they are files that try to lock the AI in for common design decisions.
the price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context.
Since the rise of AI systems I really wonder how people wrote code before. This is exactly how I planned out implementation and executed the plan. Might have been some paper notes, a ticket or a white board, buuuuut ... I don't know.
How are the annotations put into the markdown? Claude needs to be able to identify them as annotations and not parts of the plan.
> I am not seeing the performance degradation everyone talks about after 50% context window.
I pretty much agree with that. I use long sessions and stopped trying to optimize the context size, the compaction happens but the plan keeps the details and it works for me.
I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details).
This has changed in the last week, for 3 reasons:
1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to..
2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more.
3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track and correct early.
Having a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction.
Is it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out.
I don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back.
I agree with most of this, though I'm not sure it's radically different. I think most people who've been using CC in earnest for a while probably have a similar workflow? Prior to Claude 4 it was pretty much mandatory to define requirements and track implementation manually to manage context. It's still good, but since 4.5 release, it feels less important. CC basically works like this by default now, so unless you value the spec docs (still a good reference for Claude, but need to be maintained), you don't have to think too hard about it anymore.
The important thing is to have a conversation with Claude during the planning phase and don't just say "add this feature" and take what you get. Have a back and forth, ask questions about common patterns, best practices, performance implications, security requirements, project alignment, etc. This is a learning opportunity for you and Claude. When you think you're done, request a final review to analyze for gaps or areas of improvement. Claude will always find something, but starts to get into the weeds after a couple passes.
If you're greenfield and you have preferences about structure and style, you need to be explicit about that. Once the scaffolding is there, modern Claude will typically follow whatever examples it finds in the existing code base.
I'm not sure I agree with the "implement it all without stopping" approach and let auto-compact do its thing. I still see Claude get lazy when nearing compaction, though has gotten drastically better over the last year. Even so, I still think it's better to work in a tight loop on each stage of the implementation and preemptively compacting or restarting for the highest quality.
Not sure that the language is that important anymore either. Claude will explore existing codebase on its own at unknown resolution, but if you say "read the file" it works pretty well these days.
My suggestions to enhance this workflow:
- If you use a numbered phase/stage/task approach with checkboxes, it makes it easy to stop/resume as-needed, and discuss particular sections. Each phase should be working/testable software.
- Define a clear numbered list workflow in CLAUDE.md that loops on each task (run checks, fix issues, provide summary, etc).
- Use hooks to ensure the loop is followed.
- Update spec docs at the end of the cycle if you're keeping them. It's not uncommon for there to be some divergence during implementation and testing.
There are a few prompt frameworks that essentially codify these types of workflows by adding skills and prompts
https://github.com/obra/superpowers https://github.com/jlevy/tbd
this is literally reinventing claude's planning mode, but with more steps. I think Boris doesn't realize that planning mode is actually stored in a file.
Doesn’t Claude code do this by switching between edit mode and plan mode?
FWIW I have had significant improvements by clearing context then implementing the plan. Seems like it stops Claude getting hung up on something.
All sounds like a bespoke way of remaking https://github.com/Fission-AI/OpenSpec
It seems like the annotation of plan files is the key step.
Claude Code now creates persistent markdown plan files in ~/.claude/plans/ and you can open them with Ctrl-G to annotate them in your default editor.
So plan mode is not ephemeral any more.
I don't really get what is different about this from how almost everyone else uses Claude Code? This is an incredibly common, if not the most common way of using it (and many other tools).
Funny how I came up with something loosely similar. Asking Codex to write a detailed plan in a markdown document, reviewing it, and asking it to implement it step by step. It works exquisitely well when it can build and test itself.
I do the same. I also cross-ask gemini and claude about the plan during iterations, sometimes make several separate plans.
Hub and spoke documentation in planning has been absolutely essential for the way my planning was before, and it's pretty cool seeing it work so well for planning mode to build scaffolds and routing.
this is exactly how I work with cursor
except that I put notes to plan document in a single message like:
> plan quote
my note
> plan quote
my note
otherwise, I'm not sure how to guarantee that ai won't confuse my notes with its own plan.one new thing for me is to review the todo list, I was always relying on auto generated todo list
The post and comments all read like: Here are my rituals to the software God. If you follow them then God gives plenty. Omit one step and the God mad. Sometimes you have to make a sacrifice but that's better for the long term.
I've been in eng for decades but never participated in forums. Is the cargo cult new?
I use Claude Code a lot. Still don't trust what's in the plan will get actually written, regardless of details. My ritual is around stronger guardrails outside of prompting. This is the new MongoDB webscale meme.
It is really fun to watch how a baby makes its first steps and also how experienced professionals rediscover what standards were telling us for 80+ years.
https://github.blog/ai-and-ml/generative-ai/spec-driven-deve...