1) Slow code. Let the agent(s) discover and plan, then launch the swarm on the confirmed implementation steps. 2) Use LSP. If nothing works, usually you can connect it via MCP. I think all coding agents support this by now. 3) Add hooks if you want to stop the coding agent from doing something nasty, or hallucinate and give incomplete output. TDD and any verification tool you can think of are your friends. 4) Skills have been a bit of hit and miss for me, especially with less capable models. So are plugins. If you know how they work, please explain to me.
That way the model doesn't go about "let me grep this specific pattern across a million files again and again" loop and burn your entire weekly budget by Monday at noon.
I'm also curious if anyone has done something cool with memory and context management that doesn't require a custom llama.cpp implementation. I also don't have the heart to let the swarm do it end to end, because LLM generated code with less capable models really does smell, no amount of spec driven or Claude.md filled style guidelines seem to fix it.
There's `honcho` for memory, i'm starting to play with it now, but I feel like I've seen a lot of projects pop up for it