logoalt Hacker News

armchairhackeryesterday at 10:17 AM3 repliesview on HN

And there’s an incentive to publish evidence of this to discourage it, do you have any?


Replies

TeMPOraLyesterday at 10:54 AM

Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.

show 1 reply
woadwarrior01yesterday at 11:23 AM

There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.

[1]: https://marginlab.ai/trackers/claude-code/

show 1 reply
coldteayesterday at 12:11 PM

Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...