The comments I see recommending selective use of cheaper models doesn't match the reality I exp...

harimau777 • today at 12:04 PM • 9 replies • view on HN

The comments I see recommending selective use of cheaper models doesn't match the reality I experience working in the industry. I have the constant threat hanging over my head of being fired if I don't churn out code quickly enough. I'm not willing to gamble with my livelyhood by using a less effective model.

Saving money on tokens isn't something that's rewarded during performance reviews; particularly because it's difficult to quantify how much you saved versus hypothetically using a more expensive model.

Replies

bob1029 • today at 3:05 PM

I think quantifying tokens used is analogous to quantifying the amount of sawdust generated on a construction site.

Churning out useful code quickly is not solved by using more tokens per unit time. Most non-technical leaders can grasp this one and are likely more interested in the strategic game theoretical dynamics that are being forced by way of implied token consumption expectations (competition between developers).

If you want to hold out as long as possible and don't really care about anything other than the compensation package, you should at least play along with this new game in a half-assed manner. Try to goldilocks your token usage between any established extremes. You want to be in the statistical barycenter of every AI report that management can create.

➕ show 3 replies

gchamonlive • today at 2:28 PM

> I have the constant threat hanging over my head of being fired if I don't churn out code quickly enough.

And the tragedy is that this isn't sustainable, and we all involved deeply in tech know this. There is eventually going to be a big reality check the companies will have to pay, because you can't force creativity and quality, not even with AI, because actual intelligence lies with us at least for now and for the foreseeable future. However when the rope eventually snaps these executives at best will fall upwards, with big severance bonuses and a list of "contributions" we have to be grateful for. We are the ones that will suffer through the next big layoffs.

➕ show 3 replies

Terretta • today at 2:38 PM

Anyone (including ANTHROP\C) "recommending selective use of cheaper models" is spending costly human time (which costs more over time) on correcting the machine (which costs less over time). This is a bad trade.

In cost per line of code, we have verified this is always an error unless your time is worth less than the machine (unlikely unless you consider your time to have no cost rather than considering it as your hourly rate).

The worst thing for our productivity has been Claude Code or Claude Cowork taking a complex problem and turning around and writing bad instructions for dumb model agents then synthesizing the dumb answers into an orchestra of badness.

The single best fix for results-per-total-cost is to ensure it reads and thinks about whole content, not snippets, and thinks with the smartest model, not agents.

Agents should toil. Agents should neither think*, nor decide what to think about which itself is thinking.

* Agents should “think” like ants or bees or beavers think. Any human-like thinking, *especially* intuition-like thinking, should be thought by the best model available.

** Nobody should be “churning out code”. In a hierarchy of coders who translate detailed specs to some computer language, developers who write software that ships on a project timeline, and engineers who accomplish business goals, engineers should “churn out” engines structured for business outcomes.

Measured by that, the machine is leverage while reducing a variety of costs. At the same time, because most training data doesn't grok this, the machine doesn't grok it either. So it needs you to shape its toil.

➕ show 3 replies

krzyk • today at 12:44 PM

If you have such toxic environment, run.

➕ show 3 replies

lumost • today at 2:55 PM

This, I happily used the opus 4.6 fast mode to the tune of 5k for a project. The delivery of the project justified the 5k, if I only spent 500 but delivered the project 1 month later - I would have been in the dog house.

➕ show 1 reply

giancarlostoro • today at 3:04 PM

My real comment is, why were they not just using their self-hosted copies of it? Do they pay back Anthropic for use of it in Azure? Broker a deal, let Anthropic charge you drastically less to use their model AND Anthropic could have made Claude Code work directly with Azure for Microsoft employees. Pennies on the dollar, and Microsoft could do it using low use GPUs to save on cost, or stack underused GPU compute (this is how serverless was born btw - its the unused resources in a web server somewhere).

When you consider that xAI's old data center was enough to bring Anthropic back ahead, it tells me Microsoft could host their own on underutilized previous gen GPUs that are sitting there wasting server real estate.

locknitpicker • today at 5:42 PM

> The comments I see recommending selective use of cheaper models doesn't match the reality I experience working in the industry. I have the constant threat hanging over my head of being fired if I don't churn out code quickly enough. I'm not willing to gamble with my livelyhood by using a less effective model.

I don't buy it. Old models such as GPT4.1 were faster than newer reasoning models, and their output was as good. Newer models end up wasting an ungodly amount of time with chain-of-thought steps which can be a complete waste of time if you have a structured prompt such as a plan or a spec.

My experience in the real world is that users have to ration requests, and x0 models actually tend to be used far more because expensive models are left for more complex tasks.

➕ show 1 reply

bogota • today at 3:25 PM

[dead]

cowsandmilk • today at 12:18 PM

This, if you’re high performing, the company won’t question your use of tokens. If they want to limit it, they have ways to set limits on spend and usage.

alt Hacker News

Replies