logoalt Hacker News

stanactoday at 10:58 AM2 repliesview on HN

> Some are even offering API rates at 3x lower than the official ZAI api rates

Looking at openrouter [1], some of the cheaper offerings are for quantized models. Not sure how much intelligence is lost in quantization. And they are not 3 times cheaper. Where did you find 3x lower prices for APIs? I am considering skipping open router and using them directly for that price.

edit:

I see, croft [2] 8bit for $0.50/$0.08/$2.20

[1]: https://openrouter.ai/z-ai/glm-5.2

[2]: https://ai.nahcrof.com/pricing


Replies

scrlktoday at 12:09 PM

IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.

show 2 replies
benjiro29today at 11:37 AM

Neuralwatt ... When you reverse calculate the actual energy usage / price on a token basis, the gap is large.

I do not have GLM 5.2 numbers because the whole default max setting is overkill. But GLM 5.1 numbers had it at 12x cheaper then API rates. And about 2.5x more tokens vs zai their own subscription service.

Yes, its FP8 but lets be honest, do we know for sure that even zai runs at FP16? I learned a long time ago with Claude and Codex how much cheating happens on model levels, even from the big boys.

show 1 reply