Why can't Claude Code generate effective harness for us by inspecting the code base?
I tried defining CLAUDE.md (or AGENTS.md), skills, plugins, but I'm not getting the effectiveness others claim to be. LSP plugin for example, CC doesn't to use LSP's symbol renaming and edits file one by one slowly, or it does not invoke the skill when I explicitly ask to remember to invoke when prompt contains a specific clue.
Am I using it wrong? Is there a robust example I can copy the harness?
> Am I using it wrong?
I stopped using `/init` and having CLAUDE|AGENTS.md files that explained the codebase. The only thing I kept was how it should explore the codebase and use `git log` when researching, which is probably redundant too. I can't figure it out either.
The codebase I work on is roughly 100k LOC so idk if it is considered large. Personally it's the largest repo I have worked on.
What seems to work in some cases are hooks with scripts that feed into the context window (I've had to strip out some of the unnecessary linter messaging to limit context). Linters and/or other language specific checkers that can be installed via OS package repository and called via script. Also, the model + skill context together could make a difference. Skills that "worked" on 4.6 may not work as well on 4.7, which seems to require more explicit direction, but is more reliable by comparison to 4.6. Updating skills might help too. Test and run before/after to check. CC also injects unnecessary tool calls into context, so you may need to suppress tasks if you're a beads fan for example.
[flagged]
[flagged]
This is the pain point that existed for years now and its still not solved at all.
"If A, do X. Do B,C,D. Do A" - and it just never uses X because "it forgot".
You just cant trust that the time you spend building rules will actually pay off, in fact you can trust that it will fail you sooner or later.
RAG, Harness, Skills... all was supposed to fix this, but in reality it never had.