And to add insult to injury, some providers will ride on the good reputation of some local model, selling you a terrible quant instead.
With OpenAI, at least my gpt-5.5 is the same as your gpt-5.5. You can't say that about glm for example.
Sam Altman, good reputation?
> With OpenAI, at least my gpt-5.5 is the same as your gpt-5.5.
How do we know that? The "orchestration" layer probably forwards to different levels of quantization. And it seems tempting to make some sort of load balancer with adaptive computation effort.
> some providers will ride on the good reputation of some local model, selling you a terrible quant instead.
Quants in popular local inference apps (Ollama, LM Studio, etc) are the worst possible quants (RTN).
That's not a real equivalency. They are not necessarily the same (testing in production, hello!) And most importantly you do not have a local model because openai is not open!
> And to add insult to injury, some providers will ride on the good reputation of some local model, selling you a terrible quant instead.
I just started using OpenRouter for some control testing of local models and what surprises me the most isn't that there are different providers providing different quantization levels, that makes sense, but I can't seemingly find a way of seeing what provider+model+quantization is actually used?! https://openrouter.ai/models shows the models, then say https://openrouter.ai/moonshotai/kimi-k2.7-code shows the providers but when I go to https://openrouter.ai/moonshotai/kimi-k2.7-code?endpoint=e7a... for example, why on earth is it not showing the actual details about the actual weights they're serving?! Give me details! It does have a "Precision" value that is sometimes filled out, but that seems to be a guess at best, even providers with the same values there have wildly different quality responses.
I like the idea about OpenRouter but holy hell does the implementation seem very far off from what it needs to be, in order to be useful.