The article is based on running Qwen 3.6 on a 128GB MacBook Pro. For reference, a 128GB MBP currentl...

bensyverson • yesterday at 5:38 PM • 30 replies • view on HN

The article is based on running Qwen 3.6 on a 128GB MacBook Pro. For reference, a 128GB MBP currently starts at $6699 USD [0]

Some people will be happy to pay that premium for privacy, but at roughly 10X the cost of a MacBook Neo, that money could also buy a lot of credits on OpenRouter or frontier labs.

[0]: https://www.apple.com/shop/buy-mac/macbook-pro/14-inch-space...

Replies

dofm • yesterday at 5:50 PM

The maths there is pretty undeniable, but it is not where I'd make the split. Having a machine that can run some modest local LLMs, like the Gemma 4 12B, is really worth it.

I don't know how much serious hands-free agentic coding I will ever do on my MacBook alone, but I do know that I would not have got so far into understanding this without tinkering with local models, llama.cpp, LM Studio, and LM Studio and all that.

I totally struggled to find the right frame of mind to explore any of this stuff without feeling defeated and bamboozled. Because it's just huge, exhausting, jargon-drenched, unknowable, and I am over the hill at fifty-plus.

Until, that is, I could poke around with setting it up on my own (secondhand) machine, watching the API calls, understanding some of the terminology. I didn't even buy the machine for that; it's just adequate to the task.

The Neo is too small to really get much benefit from this opportunity to make it more visceral and knowable.

➕ show 11 replies

porphyra • yesterday at 5:59 PM

You can also run Qwen 3.6 27B dense model on DGX Spark with comparable performance [1][2] for about $4000 (Asus Ascent GX10 is $3999 at various retailers).

In theory you can also get 48GB of VRAM with, say, two 3090s, but it will take up a lot of space and generate a lot of heat compared to the Macbook Pro and GB10.

[1] https://x.com/MiaAI_lab/status/2070859135399182444

[2] https://github.com/MiaAI-Lab/Qwen3.6-27B-NVFP4-vLLM

➕ show 3 replies

Catloafdev • yesterday at 5:46 PM

The model they reference can be easily run with 24gb+ of VRAM, and there are other similar models capable of running easily on 16gb of VRAM. It's not like 128gb is a requirement here.

➕ show 5 replies

throw1234567891 • yesterday at 6:35 PM

But the tokens or credits are gone. MacBook stays. You can run other models on the same MacBook. What I read people burn every month on saas… for that money you break even on that MacBook in 5 months.

Edit: it’s not just “data privacy”, when you are using Claude, you are shipping EVERYTHING to Anthropic. It’s crazy.

➕ show 2 replies

acchow • yesterday at 7:27 PM

That $6700 is a $5000 upgrade over a base model Macbook Pro.

$5000 in US Treasuries (currently at 4.89%) yields $244.5/yr. That's more than enough to cover the annual Claude Pro subscription ($200/yr) which includes Claude Code with lots of Sonnet usage (far better than Qwen 3.6)

➕ show 1 reply

stymaar • yesterday at 6:09 PM

> The article is based on running Qwen 3.6 on a 128GB MacBook Pro. For reference, a 128GB MBP currently starts at $6699 USD [0]

Qwen3.6-27B would be faster on a 3090 that costs around $1000-1200 though so I don't think it's a good counter-argument.

Op just happened to have that MacBook, but it doesn't mean it's necessary to run the model.

➕ show 1 reply

nozzlegear • yesterday at 5:50 PM

Just putting it out there: I run Qwen 3.6 on my M1 Mac Studio with 64gb. It's quantized and all that, but I agree with TFA: it's the sweet spot for local development right now.

dmayle • yesterday at 6:23 PM

For that price you can put together a PC with 128GB of ram ($2000) and an RTX 5090 ($3600) and get 70-100 tokens per second instead of 45

montebicyclelo • yesterday at 6:43 PM

Isn't the directionality important. I.e. it is currently possible to run useful / great models locally, but on high end machines; and in a few years we will likely be able to run even better models on standard machines.

razster • today at 4:06 PM

I'm running it on my 4070 12gb with 96gb mem, I'm very happy with the results even if I have to wait a couple minutes for results. To me this is far better than I expected and will continue to use it and improve with skills.md. Pi.dev is amazing by the way.

organsnyder • yesterday at 5:59 PM

I run Qwen 3.6 on my Framework Desktop 128GB, and it's very performant. I know Framework has had to raise the price since I preordered mine, but they're still well under half the cost of that Macbook.

➕ show 2 replies

elorant • yesterday at 7:15 PM

You can get an AMD Strix Halo with half that price even after hardware price adjustments. Besides you don't need 128GB of RAM to run a 27B model.

dannyw • yesterday at 5:53 PM

I’m running the same model on a 48GB MBP with a q4 quant and it’s pretty decent. You definitely don’t 128GB. That’s the scale for 70B models at q8 or something.

➕ show 3 replies

shockembopper • today at 12:48 AM

I’ve got qwen3.6 27b running on my media server atm. Given that I built on top of what I already had, it didn’t cost me nearly that amount. I’ve been running 2x 5060 ti 16gbs, and when using text only and nvfp4, I can run the model with 200k context length and roughly 50-60 toks. It’s very good, and costed me about $800 after buying the gpus from microcenter.

georgeven • yesterday at 5:54 PM

I have a 1500 dollar machine that can run it at 50 tok/s (3 V100s)

➕ show 1 reply

jeffybefffy519 • yesterday at 11:03 PM

I still dont trust the Anthopic and OpenAI are not training on my code. I even just thinking keeping track of what code you have received in prompts and to train/not train on it seems like an impossibly difficult task.

➕ show 1 reply

redox99 • yesterday at 7:30 PM

I bought 2 used 3090s some years ago for $500 each. They're probably a bit more expensive now, but I guess for something like $2000 you can build a barebones 2x3090 PC which will be way faster than a Macbook. (you're fine with very basic hardware outside the GPUs)

stared • yesterday at 10:06 PM

All experiments with Qwen 3.6 required no more than 48GB Apple Silicon. I believe you can go even further with more aggressive quantizations - one can go down even further.

In any cases, from the economic point of view, running models on laptops make little sense. Even at the pure cost of energy consumption, it might be hard to beat pricing at tokens generated at scale.

At the same time, it is a breaktrough, that will change the game. Previously such vibe coding on consumer device was not hard or costly - it was impossible.

pimeys • today at 8:55 AM

Yes. It is very expensive now. I'm still so so happy I decided last summer to bite the bullet and pre-ordered the Framework Desktop 128GB model.

I paid 2424 euros in total for this machine. And it can easily run the models discussed in the comments and in the article. It's tiny, and runs CachyOS like a champ. Over 4000 euros less than the price you listed.

We can all send a thank you letter for our friendly billionaires such as Sam Altman for the price situation we're in today: https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram...

trentor • yesterday at 6:45 PM

Runs fine on 2x4080s or on two 5060/5070s with 16GBVRAM... and faster than on the mac.

dvduval • yesterday at 6:04 PM

Absolutely for the average developer the token speed is just going to be too slow for it to be workable. I think we’re looking at 2028 when memory becomes cheaper again and they’ll be a lot more people using local models.

cyanydeez • yesterday at 6:17 PM

AMD started their 128GB Halo Strix at a pretty damn good point at ~2.5k; I got mine after the first memory bump at $3k.

I think you might be a little to into the stew here.

➕ show 1 reply

Insanity • yesterday at 5:39 PM

But you have to factor in that this device will last you 5-10 years. That said, I wouldn't spend almost $7k USD on this macbook lol.

➕ show 3 replies

colinsane • yesterday at 6:39 PM

i like that people are taking the privacy argument seriously, after however many decades. i think there are other arguments to be made for running these locally which are less settled, but IMO the Fable debacle drives it home: the surest way to embrace this technology without worry that it will be taken away from you down the road is to physically own the compute.

➕ show 1 reply

ricardobayes • yesterday at 8:11 PM

Oh definitely. I've seen GLM 5.2 go for around $4 per million output tokens.

oldfuture • yesterday at 5:48 PM

a lot of credits? we can’t predict any price change for them

ant6n • today at 5:39 AM

Doesnt it run on the Macbook Neo... just slower?

AnimalMuppet • yesterday at 5:58 PM

How many credits would it buy? How long would it take to use them up? What's the payback period?

From what I understand, for a developer, $5000/month is maybe the high end, but $5000/year is fairly standard. (Is that accurate?) So if it pays back in 15 months, that's pretty decent. If it pays back in two months, that's spectacular.

➕ show 3 replies

h4ny • yesterday at 5:44 PM

[flagged]

➕ show 2 replies

alt Hacker News

Replies