How well does that work for you ? It's annoyingly inconsistent for me - I give it instructions ...

rafaelmn • today at 6:14 AM • 4 replies • view on HN

How well does that work for you ? It's annoyingly inconsistent for me - I give it instructions on how to fetch JIRA ticket with a script that renders everything relevant to a .md and half of the time it will still default to reading it via ACLI. I have instructions on how to do a full build with warnaserror before commit but I still get pipeline errors regularly because it will skip the noincremental part, etc.

Replies

paperpunk • today at 8:06 AM

I have a harness for Claude Code "hooks" (https://code.claude.com/docs/en/hooks) which in my case execute a Go tool in a separate project which runs changes made by claude through a validator with various rules that can be defined (regex, semgrep, etc.). They can warn claude or they can block changes outright.

When I find claude is using tools or approaches that I have replaced with more specific ones, I ask claude to add a hook to prevent doing this in the future and point it to the instructions of what to do instead.

And of course I wrapped all that up in a Skill so it knows what approaches to take to add things to hooks.

It becomes fairly trivial to incrementally stop it making repeated mistakes like this.

freedomben • today at 9:44 AM

I've had that happen before too, and I just added a line to CLAUDE.md or AGENTS.md something like (adapted to your example):

    When asked to fetch JIRA tickets, use the "fetch-jira" skill rather than reading via ACLI

Claude has gotten better about following CLAUDE.md over the last year (it was pretty laughably bad at it previously).

➕ show 2 replies

ffsm8 • today at 7:43 AM

You may want to try out pi-agent and create custom extensions instead.

Then codify this behavior into a process which automatically gets run through.

I.e. $repo/origin as bare repo, then prompt to create a shell script which creates the worktree and cds into it, running the script you mentioned, instantiating pi in it. Potentially define explicit phases for your workflow and show the phase in the UI - and quality gates for transitions. Eg force the implment to finalize phase to only happen if all tests succeeded. Potentially add multiple review phases here too, with different prompts. This progressively gets rid of more and more inconsistencies.

Still not a perfect solution, but on average I've had less and less to manually address with that workflow. Albeit at cost of tokens (multiple reviews phases obviously ingest all changes multiple time)

Pi-agents extensibility is just a lot better then the other harnesses, but you could obviously also just introduce a different orchestrator to do the same. For me, pi-agent was just the least amount of effort necessary to get it going.

cyanydeez • today at 7:09 AM

On a local model, with open code, when I wrote a specific javascript way to run sql queries because bash and psql were error prone, what I did was when I saw it make a mistake, I told it in passive agressive tones something like: "please edit AGENTS.md to detail how to use the query.js tool to run a query and to never use psql"; I did this two times until it stopped wanting to use psql.

It seems like if you write the docs yourself that's not leveraging the probability that the model itself knows the anti-context guard rail that best prevents it from grabbing its average tool use.

alt Hacker News

Replies