This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up...

bastawhiz • yesterday at 12:45 PM • 14 replies • view on HN

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost.

But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this!

It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac.

Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

Replies

datadrivenangel • yesterday at 12:59 PM

Rounding everything down in the most optimistic setting got me to $0.40 per million tokens, and openrouter has the same model at $.38/mtok.

➕ show 4 replies

avidphantasm • yesterday at 10:59 PM

Not sure where 40 tokens per second is coming from. I’ve seen 95-100 tokens per second on M5 Max 128GB running Gemma 4 31B. I’ve done experiments where it is faster than Claude Opus 4.5 for the same prompts.

faitswulff • yesterday at 1:43 PM

The article makes no sense. I can't use OpenRouter as a general purpose computing device. Why are we comparing a whole computer to a single purpose SaaS?

➕ show 3 replies

ikidd • yesterday at 11:24 PM

Actually, figuring it on generating tokens 24/7 is the best case scenario. if you figure it at 8 hours a day of actual use, you still have the fixed cost of the hardware being the highest portion of the budget, but now you generate 1/3 the tokens so you triple that cost per token.

econ • today at 1:09 AM

Boss, I make 16.50 per hour, say 15, I work 36 hours, say 35, say 500 per week, say 4 weeks per month, that's only about 2000! Don't you agree I need a raise?

outside1234 • today at 3:26 AM

We also have no idea what it actually costs Anthropic. This could be wildly subsidized and actually Apple Silicon is more cost effective.

statestreet123 • yesterday at 2:31 PM

Rounded up, yes, and oddly inefficient for someone obsessed with inefficiency. One could buy a brand new 64gb M5 macbook for well over 4k. Another could buy a scratched up but functioning M1 Max 64gb off of ebay for a little over 1k—and somehow get the same 10-20 t/s with 31b that the author does with an M5. Or better yet, have a frontier model do the planning and judging, and have a local MOE model execute at 50 t/s. All of this achievable by a former English major with too much free time.

➕ show 1 reply

giancarlostoro • yesterday at 7:51 PM

Honestly, I don't even see my Macbook Pro costing me anywhere near as much as using any of these AI services, but maybe I'm just not seeing a significant increase in my power bill to notice? I am the power user who uses Claude Max pretty much all the time to prototype ideas, and build things I actually use, and has given me a lot of value, I work full time and have a family to raise and care for, my free coding time is mostly limited to ideas. Now I can draft a plan with detail, review the code, run the code, test it, and use software custom tailored to my needs.

dist-epoch • yesterday at 1:02 PM

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

➕ show 2 replies

make3 • yesterday at 10:28 PM

The real reason this comparison makes no sense is that only a vanishingly small fraction of people seriously using ai to code would seriously use a model so far from the top models (including open source ones).

He should compare his MacBook to Open Router on Kimi 2.6 1.1T or GLM 5.1 (754B), at bfloat16 precision, which he can't ofc.

But it furthers his point that things like open router are a better idea, which is not surprising.

PunchyHamster • yesterday at 8:01 PM

> Yeah, data centers don't pay residential electricity rates.

There are 2 caveats here:

Some places have higher prices for industrial than residential power as residential one might be subsidied by govt.

And DC also pay for cooling, which residential will only effectively pay if they have AC and is hot outside. So power rates are some multiply of industrial pricing.

➕ show 1 reply

llm_nerd • yesterday at 2:46 PM

Your post makes sense if you bought the hardware for other reasons, and maybe run models occasionally as a novelty.

That isn't the case for many, though, and there is a whole social media space where people are hyping up the latest homebrew options for running models, believing it frees them from the yoke of big AI.

Millions of people are buying big $ maxed-out hardware like the Mac Studios or DGX specifically to run LLMs. Someone rationally running the numbers is a good thing.

➕ show 2 replies

cyanydeez • yesterday at 1:28 PM

nothing about the current data center craze looks efficient.

➕ show 2 replies

espadrine • yesterday at 4:26 PM

[dead]

alt Hacker News

Replies