A useful feature would be slow-mode which gets low cost compute on spot pricing.
I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
OpenAI offers that, or at least used to. You can batch all your inference and get much lower prices.
Yep same, I often think why this isn’t a thing yet. Running some tasks in the night at e.g. 50% of the costs - there’s the batch api but that is not integrated in e.g. claude code
The discount MAX plans are already on slow-mode.
> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.
https://platform.claude.com/docs/en/build-with-claude/batch-...
> The Batches API offers significant cost savings. All usage is charged at 50% of the standard API prices.