My experience exactly! I’ve recently become so tired of the Claude harness that I switched to OpenCode (which is extremely good compared to Claude). However, OpenCode is also tedious to change, and it inherits all the “good stuff,” like treating agents as Markdown files and all the dancing around with hooks/plugins/skills scattered all over the place. Getting stuck again and again, I’ve ultimately come to the conclusion that this must be solved by writing my own damn coding agent, with extensibility that’s acceptable for real-world engineering.
Harness is where the open source should shine. It doesn't require millions of dollars of compute but the search space is vast and explorable with limited budgets.
I use small model I like to give them TOC more than lines wonder how it'd stack up with the hashline approach
read_toc tool:
...
{
"name": "mcp",
"qualified_name": "mcp",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::17::18::python::mcp",
"is_nested": false
},
{
"name": "handler",
"qualified_name": "handler",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"is_nested": false
},
....update_content tool:
{
"content": "...",
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"project_root": ....
}really enjoyed reading this, although I'm a dumb farmer and it took me a while to understand lol
Great article, recommend reading all of it.
> Why bother, you ask? Opus may be a great model, but Claude Code to this day leaks raw JSONL from sub-agent outputs, wasting hundreds of thousands of tokens. I get to say, “fuck it, subagents output structured data now”.
This is why I find the banning of using Claude subscriptions in other harnesses is so heinous. Their harness that they're forcing onto everyone has tons of big issues including wasting massive numbers of tokens. Very much in line with intentionally refusing to adhere to standards in the most IE6 way possible.
Is there a skill file I can use for these edits?
I feel a lot of confusion at which coding harness is best and what options to use. tbh I have mostly used standard aider and I don't know what the consensus is on this tool.
I feel I want to write my own and that maybe in the future a lot of developers will have custom harnesses and have highly customized versions as each user of these models wants to use these things in a way that's unique to their brain, much like how emacs is so great for the customization but one persons emacs config is often not what another wants or only wants a subset and then write their own features.
As an aside what is the feeling on all the various ai coding tools, does aider suck is that aider-ce/cecli are better or are the bespoke tools for each model like claudeCode and such better.
[dead]
[dead]
[dead]
I agree with this article completely, nice to see it presented quantitatively.
>re "only" the harness changed
In our experience, AI's are like amnesiacs who can barely remember what they did three minutes ago (their last autonomous actions might still be in their context if you're lucky), with no chance at remembering what they did three days ago. As such, the "harness" determines their entire memory and is the single most important determinant of their outcome.
The best harness is a single self-contained, well-commented, obvious, and tiny code file followed by a plain explanation of what it does and what it's supposed to do, the change request, how you want it to do it (you have to say it with so much force and confidence that the AI is afraid of getting yelled at if they do anything else) and a large amount of text devoted to asking the AI not to break what is already working. Followed by a request to write a test that passes. Followed by asking for its judgment about whether it broke what was already working on or not. All in one tiny crisp prompt.
With such a harness, it's able to not break the code one time in twenty. If you use reverse psychology and ask it to do the opposite of what you want, it rises to fifty-fifty odds you'll get what you're trying to do.
Don't believe me? You can watch the livestream (see my previous comments).
Baby steps toward Utopia.
> Why bother, you ask? Opus may be a great model, but Claude Code to this day leaks raw JSONL from sub-agent outputs, wasting hundreds of thousands of tokens. I get to say, “fuck it, subagents output structured data now”.
The VC economics are creating a reality distortion field where Anthropic is incentivized to burn more tokens so they can rent more GPUs so they can get more investment, and where I am incentivized to pipe the LLM inputs into `claude -p` and blast 50KB of useless proompt onto it so they don't ban me from their 95% discounted API endpoint.