At scale, distributed API routing shouldn't call accounting transactions, that expands the availability risk surface and adds latency to all valid requests for no reason (other than helping the minority of companies/users who want their product to stop working when it is popular).
Distributed “shared nothing” API handling should make usage available to accounting, and the API handling orchestrator should have a hook that allows accounting to revoke or flag a key.
This gets the accounting transactions and key availability management out of the request handling.
That is a nice excuse, do you work at Google? :) I get the idea of not slowing down requests or risking availability, but don’t tell me a company as big as Google can’t design an asynchronous accounting system robust enough to handle this. We’re not talking about penny-perfect precision - blocking at 110% or even 150% of the set cap would be enough. Right now, though, there’s nothing to prevent a $5k, 20k or even higher bill surprise due to API key leaks, misuse or wrong configuration. To me, this is unacceptable and one of the reason I try to avoid using gcloud (the other one is unbearably slow gogole cloud console "webapp").