productivity (tokens per second per hardware unit) increases at the cost of output quality, but the price remains the same.
both Anthropic and OpenAI quantize their models a few weeks after release. they'd never admit it out loud, but it's more or less common knowledge now. no one has enough compute.
Pretty bold claim - you have a source for that?