Interesting things Dirac does:
1. Uses an optimized version of Hash-Anchored edits for file editing (https://dirac.run/posts/hash-anchors-myers-diff-single-token)
2. Utilizes language's AST to decide what to fetch into context, entirely avoids large code file reads
3. Batches all operations. Does large number of reads/edits simultaneously (you can see a video demo for deepseek-v4-flash here https://www.reddit.com/r/LocalLLaMA/comments/1suhdki/tested_...)
4. Allows the model to execute code to analyze things on the fly, so the model can simply write bash/python/perl script to accomplish things where appropriate
5. A lot of context curation and opportunistic context updates, i.e. put into context anything that you are certain model would ask next
Anchor based editing requires injecting new anchors to the context, and dirac does so via a diff. So how is this more efficient (token-wise) than search and replace?? Even at a single token per hash. Also, code is read more than written so these just add up. I experimented once with stable anchors, albeit longer than a single token, and found it a downgrade.
My conclusion is that the efficiency dirac sees comes mainly from showing file skeleton by default
For the hash-anchored edits, sharing here Can Bölük's original post about the idea https://blog.can.ac/2026/02/12/the-harness-problem/
> Batches all operations. Does large number of reads/edits simultaneously...
I wasn't sure what this meant, so I looked at the source. It seems to be referring to tool APIs being designed around taking multiple targets as a list parameter, instead of hoping the model makes appropriately parallel tool calls. (This matches my experience btw, models are reluctant to make a large number of parallel calls simultaneously, and this seems more pronounced with weaker models.)
Instead of burning tokens on SOTA models, why not use a dirt-cheap specialised model for file editing?
Where the SOTA model just makes a cheaper model to make edits, and it does so.
> Utilizes language's AST to decide what to fetch into context,
Does that mean that it's only going to work with certain langauges for which it has parsers available?
Is there a complete list of the tools somewhere? I'm interested in how you chose to expose the AST specifically. In my own harness attempts I wanted to keep the number of tools absolutely minimal and briefly experimented with including an AST lib to use via an execute_python tool (plus some examples in the system prompt). Results were mixed though, with most models preferring ripgrep.
It would be really cool to do a causality investigation to determine which one of these boosts it so much / quantify how much each matters. Who knows, they may all interact in a sum-is-greater-than-parts way that only improves the score when shipped altogether.
How are the two token anchors chosen when the initial 1700 single token anchors run out? I'm assuming just a 2 word combination from the 1700.
Did you consider incorporating ast-grep or gritql?
Congratulations, great work.
[flagged]
I always wondered why AST's were not more of a part in both editing and scoping of changes/parsing code. I thought I read an article where they said 'grep' was just as effective. It kinda made sense for the case they were talking about.