I've hit this point with AI where it's not a simple process, but a long drawn out back and...

bottlepalm • today at 12:40 AM • 34 replies • view on HN

I've hit this point with AI where it's not a simple process, but a long drawn out back and forth.

I'll use AI to design the implementation of a medium sized, cross cutting feature. Review all the details, maybe iterate on just that. Then implement with Claude 4.7 Max - which runs slower, but does a better job. Then review the implementation, then have Codex GPT 5.5 xhigh fast review it - which almost always finds corner cases. Have Claude fix those - Claude is better at writing intuitive maintainable code versus Codex overengineered/shortcut filled code. (Codex is better at finding/fixing bugs and doing reviews - it's annoyingly pedantic)

Then repeat with fresh Claude/Codex instances having them both review the current staged changes and getting feedback, handling the feedback. Then covering it in tests. I mean overall I still implement the feature faster than coding it manually, but I spend a majority of the time going back and forth with reviews, handling corner cases and at the finish end up with what I feel a really solid implementation of whatever feature I'm working on. The v1 feature feels more like a v3 given the amount of iteration it already went through.

Replies

aomix • today at 2:21 AM

Talking the problem to death with the AI before implementation is a nice zone for me. I feel productive, get good results out of the AI, and still largely understand the code. That’s the part of the AI revolution that I feel has made me a better engineer because I argue about design and architecture all day with a robot.

➕ show 11 replies

scosman • today at 12:55 AM

yes exactly. Too many people ask AI to one-shot complex tasks, and wonder it behaves like a junior asked to rush something.

I have my own skill: 5 rounds of research/planning/test-planning. Interactive with me in loop for all important decisions. Starts with high level shape, then details. Planning can take 2-3 days of my time, then the implementation agent can take many hours (Opus 4.7). It splits the implementation across many phases/commits, each with its own code-review fix loop. Deep code review at the end can take another hour or two. It opens a PR, Gemini reviews, it reads out and resolves those issues.

Projects still take days or weeks, but 5x faster than doing it all myself.

Edit: the skill - https://github.com/scosman/vibe-crafting

➕ show 5 replies

dawnerd • today at 2:56 AM

When I use ai to code this is pretty close to my workflow too but I find it ends up taking at best just as long as if I were to write the code myself. If m some cases I’ve thrown away what the ai has done and just done it myself. I think that’s just a skill people need to learn - at a certain point you have to cut your losses. I’ve seen some coworkers argue back and forth with an llm trying to get it to do something. Especially true on simpler changes.

➕ show 1 reply

rootnod3 • today at 12:50 AM

And then Anthropic has an outage and you what...have a coffee break until then? All that time babysitting the AIs just to be a little faster but probably with less knowledge/control over what they did?

➕ show 16 replies

democracy • today at 2:42 AM

Similar approach, but I also go a step further with some basic manual architecture/high level contract/stubs setups, just to keep it consistent with other systems (and easier reading as well).

Animats • today at 6:14 AM

How much are you spending a day for the tokens to do that?

Ingest big project, comment on it gets expensive. I'm not sure how expensive.

➕ show 1 reply

jwillmer • today at 1:19 PM

Check out jwillmer/ai-status at GitHub @bottlepalm. It helps keep track of all the small fixes that are going on simultaneously. I crated the tool for me since I have similar workflows.

germanptr • today at 8:24 AM

I follow a similar approach and use multiple LLMs per task. The quality improvement is surprisingly large.

Lately I’ve been experimenting with adding an explicit reward function so the models optimize for measurable output quality.

This creates a generate, critique, revise loop where candidate answers compete for a higher score. It feels promising because it reduces the amount of handholding for every task. It is also more fun because part of the review process is embedded in the scoring function, which simplifies the review effort.

alexwwang • today at 11:00 AM

I think you need a skill to review those code by agent itself, but in a different role, not the one who wrote them. I did some research on this and developed a skill to get things done. By now it works well though I decide to prove and improve it with more tests. Dog food is not always delicious but not too bad either.

➕ show 1 reply

chrisweekly • today at 1:33 AM

You helpfully cite Claude w/ Opus 4.7 max and Codex w/ GPT5.5 xhigh fast, but what "AI" do you use for the initial design?

➕ show 1 reply

vessenes • today at 12:52 AM

I have a very similar workflow, and experience similar temperaments from the agents. I also find anecdotally that they are moderately competitive - you get very different attention from them when you say "competitor X wrote this - please find all bugs" than when you say "you just wrote this - please find all bugs".

➕ show 1 reply

sunsetSamurai • today at 2:08 AM

maybe it's dumb question, but how do you feed the results of one agent to another? do you copy and paste manually? or how do you do it programmatically?

➕ show 4 replies

nomel • today at 2:01 AM

I've noticed the following really helps (most important at end):

1. Have claude form the plan and converse with a simple "Note any concerns with this plan" type plan-critic agent.

2. Let it run.

3. After (with everything in context) have it make a future_recommendations.md.

4. Have it make a plan.md to implement those future recommendations, conversing with the plan critic..

5. Clear context. Repeat with 1. Do this loop a few times, with some feedback from actual review thrown in.

But, most importantly, because Claude will aggressively try to maintain code "as is", and happily build on it's previous crap, while preferring to hand roll implementations of everything, add something like this to memories/directives:

* When evaluating designs, default to "pull in the library" over "hand-roll it." Hand-rolling is much worse than a dependency.

* "Precedent" / "matches house style" / "reuses existing pattern" / "consistent with what we already do" are not valid engineering arguments.

* This project is still in the development stage with no real deployments. Mitigation costs and existing precedence are not a concern.

With these, in the last week that I've started using them (after inspecting the insane justifications for leaving crap design decisions in the plans), Claude went from junior level slop that required more oversight than it was worth to something very reasonable, using standard libraries, requiring nudges for architecture rather than pure "wtf!?".

I think they've fine tuned heavily towards "don't rewrite the codebase" tuning, which completely rational from multiple perspectives, but also not appropriate for new code.

I do enjoy a considerable daily token allowance, so this may not apply to everyone.

rtpg • today at 11:21 AM

tbh I'm just confused at why people ask AI to design features. Do you not know how to design a feature? Do you not know what you want?

This stuff works so much better when you just tell it what to do

➕ show 2 replies

comboy • today at 11:17 AM

Have you tried telling claude to review with subagent? It too almost always finds corner cases (usually nothing serious, but most stuff is things that good coder would have thought of)

➕ show 1 reply

newsicanuse • today at 3:19 AM

At this point one might as well code by themselves

➕ show 1 reply

rjprins • today at 7:29 AM

This exactly my process as well. Although interestingly I swap Codex and Claude; having found Claude way more pedantic in its reviews and codex more pragmatic in its implementation. Maybe it differs per programming language.

onlyrealcuzzo • today at 12:20 PM

> I've hit this point with AI where it's not a simple process, but a long drawn out back and forth.

In my experience, even on a relatively trivial task, you can ask an LLM at least 20 times:

Is this actually done, or only partially implemented? Did you finish x, y, z?

And the LLM will say, no, I'm not done and keep working.

After that, I'll feed the branch to a different LLM, and ask if the implementation matched the design, where it's weak and needs improvements.

Same thing - that feedback will usually only be partially finished for several rounds.

When they all agree it's done - I'll finally look at the code, and there's still typically glaringly obvious problems - duplicate systems that reinvent the wheel, etc - that will take typically more than one prompt to get right...

Getting things right takes almost ~100x as long as getting things almost right with LLMs.

You can tell an LLM to "make me Rust, but easier. Make no mistakes," and it'll plan out a 100 commit process and get something that - somehow - sort of works... but isn't even close to complete.

Still, on a cost basis, you're still able to get features that would take yourself several times longer and cost orders of magnitude more money, and - if you're doing it right - they'll probably do a better job than you would've done (at least for me).

➕ show 1 reply

boringstack • today at 3:48 AM

You've essentially promoted yourself from coder to engineering manager, trading syntax fatigue for the mental marathon of refereeing specialized AI developers to ship v3-quality code on the first try.

➕ show 1 reply

shakabrah • today at 3:37 PM

Sounds exhausting

➕ show 1 reply

skydhash • today at 1:45 AM

That sounds too much like three weeks of work saving you three hours of planning.

In my experience, software engineering is a matter of knowledge. Understanding it and then coming up with a solution. The latter is a flash of insight that comes mostly from experience. Then you gather more information to flesh it out, or brainstorm it with your colleagues.

What you're describing sounds more like a ritual of doing busy work than anything practical. Because tasks vary so much. A feature may be huge, but you take care of it in a day with copy pasting because you already have all the building blocks in other files. And something may be twenty lines of code, but you spent the whole week sweating on it (concurrency stuff maybe). Those ritualistic workflows sounds more like someone imagining software development than actually doing it.

➕ show 1 reply

i_love_retros • today at 2:32 AM

This all sounds insane. If it requires so much back and forth with the AI why on earth wouldn't you just write the code yourself? At least then you build the mental model of the code and keep your brain healthy. Reading the comments in here about all the hoops people are having to jump through just to do the same thing they were doing a year ago without AI... and spending a fortune to do it! I think you've all got AI psychosis.

➕ show 3 replies

toobulkeh • today at 2:35 PM

I’ve found that it’s a lot like discovering a feature instead of designing it all up front. Like chiseling marble.

I’ve found it useful to write out a list of feedback / issues and have a bunch of sub agents work on them in worktrees with a loop bringing them all back together. That way it can work for a few hours while I just can review a bulk at a time.

kilroy123 • today at 8:43 AM

I've settled on the same workflow.

Also I never multitask with multiple agents doing other stuff. Meh I focus on just the one task.

henry_bone • today at 6:13 AM

That sounds expensive.

zrn900 • today at 10:37 AM

You could just use Xiaomi Mimo for all of that and it would be cheaper and faster than all of them...

➕ show 3 replies

blehn • today at 1:37 PM

This seems like a typical AI workflow, but isn't it dreadfully boring?

➕ show 2 replies

jiggawatts • today at 9:57 AM

The funny thing is that you've just described an idealised development process as would be used by effective, skilled humans in a heterogenous team where everyone has a speciality.

If only things were so! If only code was discussed, reviewed, iterated on! If only the "manager" actually read the code, provided actionable feedback, and disseminated PRs to multiple people with diverse skill sets.

(If you can't tell, I'm a jaded consultant desperately trying to make the horse drink the water.)

petesergeant • today at 8:56 AM

The Claude/Codex loop is the current state of the art in my opinion. I've got a silly little harness that glues them together that I have spent all day, every day in for months: https://github.com/pjlsergeant/moarcode

isabelc • today at 3:05 PM

Your comment begins like ai slop.

➕ show 1 reply

atoav • today at 7:39 AM

I am not switching the different LLMs as much, but my approach is similar:

1. I write a list of things I want to have without AI support

2. I discuss the list with an LLM, which occasionally reveals obviously missing things I hadn't thought about or just things that would be smart to have. Or sometimes the LLM doesn't get it and wants to funnel me down a commonly walked path, which is a non-goal

3. From that list I draft an implementation plan containing things like how the code shall be structured, which language, libraries, build systems, etc to use. This may even contain some data models and considerations that are more detailed, like for example ideas about how a specific interaction shall be event sourced. I work on that, till I feel a satisfactory level of clarity has been reached

4. Actual writing of code as a back and forth between manual writing, letting an LLM write something and so on. LLMs suck at writing CSS that feels like good UX design to me, so usually templates, layout and CSS will be (re)written entirely by hand

5. Bug-hunting and guessing potential edge cases is one thing where LLMs really shine. Often if the work before that was quality the LLM has an okay time coming up with fixes that are no worse than what I would have done.

DonHopkins • today at 2:42 AM

Low frequency defensive long drawn out back and forth bullet dodging vibe coding should be called "serpentine coding".

The In-Laws (1979): Getting off the plane in Tijuara:

https://www.youtube.com/watch?v=A2_w-QCWpS0

➕ show 1 reply

topheroo • today at 1:04 PM

This is where I’m at too lol.

imadierich • today at 3:49 PM

[dead]

alt Hacker News

Replies