The thing I do not get with these routers is that you will have more cache misses (5min ttl). And if there is one thing i’ve learned; using the cache is crucial.
How does this router translate to $$$ when developing?
Artefact-based workflows solve this problem, and I think it’s more effective to go in that direction.
I still have Claude Code because Opus makes good plans, but I hand the plan over to M3 on Pi with 99.9% cache hits on a long session. Lovely. Pi then makes a summary file that Opus can use to review the code/context.
But you do need them to write down their stuff, so that compaction and clear sessions can work off a nice, concise document.
And if you are simply using Claude Code, then /advisor is what you want: a sub-agent with a much cleaner context is spawned to handle something -> not cached per se, but much cheaper to run.
I’d stay away from workflows that automatically route between models unless you can afford the cache misses. That’s also why GLM 5.x is costing me much more, I don’t get good caching with it.
You're right and that's why we built the router to be cache aware! Once it starts using one model, the threshold to switch to another model will be higher because the additional cost of the cache miss needs to be worth the cost savings or quality increase.
This is the key thing that other routers we've seen miss: they're stateless so for a coding agent use case you end up spending more money due to all the cache misses.