We really need some consolidation around commands, skills, subagents, and plugins. For example, if you want to, say, review code, you have five options now:
- Write a .claude/commands/review.md. Simple but deprecated.
- Use a /code-review skill, either one you install or one you just write yourself (it's just Markdown, after all).
- Use the /pr-review subagent. Also just Markdown, but it runs "in the background" and "in parallel", so it must be better, I guess.
- Install the /code-review plugin. This just installs the skills and subagents above.
- Simply ask Claude to review the code. Probably works almost as well as the above in most situations.
They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in. There's not much advice here about which option is best, and no clear best practices seem to have emerged yet either. Personally, I find just asking Claude to review the code works well enough.
Some of the advice here is also off. For example:
"Install a language server plugin. Type errors and unused imports caught after every edit. Highest-impact plugin you can install."
I work mostly with Rust, Python, and Dart, and followed similar advice, installing LSPs for all three in both Claude Code and Codex. Two months later, after heavy development in all three languages and hundreds of sessions - and frequently running out of RAM due to all the Rust analyzer, Dart analysis server, and Ty LSP servers the harnesses were spinning up - I checked the session logs to see how often the agents were actually invoking the LSP tools. The answer was they had invoked them literally once the entire time. I uninstalled all my LSPs and haven't looked back. The agents do just fine using ripgrep and calling cargo clippy, dart analyze, ty check, etc. themselves.
> They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in. There's not much advice here about which option is best, and no clear best practices seem to have emerged yet either. Personally, I find just asking Claude to review the code works well enough.
The subagent approach is structurally different from the others because it runs with clean context. That has three major effects:
1. All other things being equal, it will result in a lower cost-to-solution because of the quadratic cost scaling of an LLM session (input token or cached-input cost being paid with each new round).
2. The review model will not be able to 'cheat' by retaining assumptions from the main session, such as "x must be done like y." For people, this is why having a separate person perform code review (or, if not possible, reviewing code after a mind-clearing break) is handy; the applicability of this analogy to LLMs is vague but reasonable.
3. The main model will only see the results of the review, not the detailed reasoning that leads up to it. On one hand this avoids more context pollution, but on the other hand it might lead to duplicative logic to re-discover the mechanics behind bugs found.
> I checked the session logs to see how often the agents were actually invoking the LSP tools. The answer was they had invoked them literally once the entire time.
I think the intent behind 'install a language server plugin' is that these tools should lint automatically after every edit, without waiting for an explicit call from the LLM.
I just consider this temp phase because models are dumb and harnesses are not yet there.
When I need code review I should just say “review it”. Model should figure out what plugins, skills, etc. to use.
> They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in.
Yes, yes, thank you, sometimes I feel like I'm taking crazy pills.
The industry and overall developer ecosystem has become absolutely mesmerized by the act of creating and popularizing little bits of protocol and machinery to dress up the act of inserting text into the machine. Yes, they're useful and provide some consistency, but I'm convinced that the main reason people like them so much is because they put a thin "I'm still a programmer wielding complicated tools that laypeople don't understand" coating over the fact that we're all just asking the AI nicely to do a thing.
I imagine that the companies that earn money from input and output tokens really, really like excessive skills because of the sheer amount of potentially pointless constraints and instructions being sent back and forth ("don't store passwords as plaintext", "always check for syntax errors" and other obvious guidelines).
Hey, Boris from the CC team here. I agree, we're working on consolidating these. Going forward it will just be the built-in /code-review skill.
Here's how to use the skill on the latest version:
/code-review # do a balanced code review. checks for bugs and inconsistencies, poor code quality, duplication, band aids, etc.
/code-review --fix # same as above, but also fix the issues
# choose an explicit effort level (defaults to your current effort level). all of these also accept --fix:
/code-review low
/code-review medium
/code-review high
/code-review xhigh
/code-review max
# do an expensive and extremely thorough review (reliably catches >99% of bugs, costs $3-20 per review depending on complexity):
/code-review ultra
Open to feedback if anyone has feedback or ideas for how to make these even nicer to use.