How are you so sure that frontier API models are always running the same quant/weights/etc? You think OpenAI and Anthropic are running essentially just vLLM endpoints? Of course not.
Firstly, we know Anthropic has been doing prompt injection into their 1P APIs (not bedrock/vertex AFAIK) for at least a year now. https://old.reddit.com/r/ClaudeAI/comments/1f6hcwo/injection...
This can be verified pretty quickly like OP — count the token metrics, if your context contains classifier-firing terms, you’ll see input_tokens being higher than your input.
So if they’re already doing that, what makes you think it’s just a dumb API, instead of a complicated pipeline filled with trade secrets and optimisations?