We haven't experimented with routing to local LLMs much. Technically they benefit from the cache too although it's more a question of latency than cost. But tbh I haven't seen great results in the wild from working with local LLMs for coding - curious if you've had any success with them?
I generally used them for token saving purposes, just using them for repetitive tasks, gated and supervised by claude. So its planned and verified by better models, but implementation falls on local ones. It has been pretty effective for me, as long as I spend a bit more initially on splitting complex tasks further down