I ran into a problem at work recently: we are given access to a bunch of models up to a full Claude Opus 4.8, but a monthly budget of 100k tokens. We are also given access to Gemini 3.5 Flash & 3.1 Pro with a daily budget of 50M tokens, but no tool calling. I'd love to hook Claude Code (or Pi) into the Gemini model, but the lack of tool-calling makes it quite difficult. I've been planning out how an intelligent router might be able to use a token-efficient tool-calling model (including a small local open-weights model) to handle the basic tools like reading from the file system or interfacing with MCP servers such that context is gathered, but then send the built up context to the Gemini model where I have a nearly unlimited (for my use cases) token budget.
Could your router handle this?
Yes we can route to Gemini models too and we handle all the translation complexity there!
Monthly budget of 100k Opus tokens? So $2.50 worth?
I’m curious how a workplace ends up with a model policy like this. It seems like you’d spend more time trying to work out how to use a tiny number of Opus tokens than doing it yourself.