Many people wanted to be able to set a spending limit on google cloud account for many years but they were unable to implement anything, always suggesting a workaround by hosting a Cloud Run function which would remove billing from a project via API https://docs.cloud.google.com/billing/docs/how-to/disable-bi...
At scale, distributed API routing shouldn't call accounting transactions, that expands the availability risk surface and adds latency to all valid requests for no reason (other than helping the minority of companies/users who want their product to stop working when it is popular).
Distributed “shared nothing” API handling should make usage available to accounting, and the API handling orchestrator should have a hook that allows accounting to revoke or flag a key.
This gets the accounting transactions and key availability management out of the request handling.
I haven't used these budget alerts, maybe they are a pain to implement?
https://docs.cloud.google.com/billing/docs/how-to/budgets
They are still not a spending cap of course.
reminds me: Ever used Gemini API on Google Vertex Cloud API? The usage will show up like 24-48 hours later in the dashboard. So when you use Gemini's API on their Cloud me as Workspace admin cannot even track my own usage in near realtime there. Which makes me think that even Google cannot track it in realtime.
As someone who is new to the whole google cloud ecosystem, the amount of dark patterns they employ are absolutely shocking. Just off the top of my head:
1. You never know how much a single API request will cost or did cost for the gemini api
2. It takes anywhere between 12-24 hours to tell you how much they will charge you for past aggregate requests
3. No simple way to set limits on payment anywhere in google cloud
4. Either they are charging for the batch api before even returning a result, or their "minimal" thinking mode is burning through 15k tokens for a simple image description task with <200 output tokens. I have no way of knowing which of the two it is. The tokens in the UI are not adding up to the costs, so I can only assume its the first.
5. Incomplete batch requests can't be retrieved if they expire, despite being charged.
6. A truly labyrinthine ui experience that makes modern gacha game developers blush
All I have learned here is to never, ever use a google product.