logoalt Hacker News

iagooaryesterday at 7:23 PM61 repliesview on HN

I love my MacBook Pro M5 128GB RAM and I love qwen3.6.

BUT DO NOT buy this MacBook if you plan on doing serious coding using local LLMs with it. The reason is simple: your fingers will burn and your head will explode from the noise.

Running any kind of sophisticated job on the very laptop you are using is just not viable. Sure you can use it in clamshell mode, but forget touching it while working with AI coding or agents.

If you want to run Qwen3.6 27B / 35B at its best, get a MacMini M4 with 64GB of RAM and put it in the basement - or at least a few meters from your desk. Connect to it over LAN or Tailscale. The MacMini will also cost you almost 1/3 of the MacBook Pro.

Thank me later.


Replies

jasonjmcgheetoday at 4:22 AM

I'm surprised no one has else has mentioned - low power mode.

With no speculative decoding, using high power mode, I get 80 t/s on 35B A3B - and it gets hot and spins up. On low power mode I get 38 t/s - no fans, cool to warm laptop.

If you currently don't use speculative decoding and you start using it, it can nearly offset the difference between high and low power, and it's night and day experience.

I almost always keep my laptop on low power mode.

show 1 reply
astrostlyesterday at 10:59 PM

> MacBook Pro M5 128GB RAM

614 GB/s of memory bandwidth

> MacMini M4 with 64GB of RAM

273 GB/s of memory bandwidth (also only currently available with 48GB)

When it comes to inference speed, you want your model to fit in memory, and then to have as much memory bandwidth as possible. In this case a hypothetical Mini with 1TB of memory would still be over 2x slower with 27-35B models.

And FWIW I have an M4 Max MBP 128GB that I keep on a Roost laptop stand, with a separate keyboard/mouse/video. It does fire up the cooling jets when running local LLMs, but stays within tolerance for me on noise. I haven't heat-tested it on longer runs, but I imagine the risen airflow helps a ton.

show 1 reply
SwellJoeyesterday at 8:45 PM

I opted to buy a normal 32GB laptop for this very reason. I know how loud and hot the GPUs in my desktop run when running even smallish models like Qwen 27B or Gemma 4 31B (which is a better model for most than Qwen 3.6, despite the benchmarks). I also have a Strix Halo which doesn't get loud, because it has a single huge fan, but it does get hot. So, there's no way a laptop could work as hard as models make them work, and not be unbearable. Tiny fans trying to remove all that heat? They gotta be screaming. No reason to spend all that money on a laptop that I couldn't realistically make use of. I do run a lot of VMs on my desktop, but I can get to those on a VPN.

It's a nice idea to run a model on a laptop so you can work anywhere...but, that's a job for models in the cloud. Not much data has to traverse the network, so it's not a big deal. Or one could also setup a VPN so you can reach a self-hosted model on a big box at home for things that require data privacy.

All that said, there are models that work great on very small devices for some tasks and won't work it to death. Gemma 4 12B QAT 4-bit runs on a 16GB device, maybe even smaller, including a tablet. It's the best self-hostable vision model I've tested for my purposes (categorization, identification, labeling, type stuff), beating much larger models. It's also a decent conversationalist with good prose but it doesn't know much of anything (not a lot of the world fits in 7GB), so it needs search if you want to use it for research. It's a pretty good tool user. I definitely wouldn't want to use it for code, though, beyond very simple stuff.

show 1 reply
andaiyesterday at 8:40 PM

> The reason is simple: your fingers will burn and your head will explode from the noise.

So, just buy a mac mini and put it in the other room? ( Like everyone was doing in February? :)

I've been running coding agents on my laptop in yolo mode for the past half year or so (though mostly not local ones, laptop too slow!) and the way I'm doing that without terror is that I just gave them their own Linux user "agent". They're free to nuke their homedir /agent, and they can't touch (or even read) mine.

There's some slight ergonomics issues (I need to sudo into the user to do anything, but I set up an alias for it), sometimes I get issues with permissions or ownership (gave up on "sticky bits" and just made a function I can run once a day when it breaks).

There's enough hassle that I wish I just had a dedicated machine for it, and then I'd just give them root on it. (For giggles I gave claude root on a $3 VPS and that's going just fine...)

But yeah after months of trial and error I reinvented "just buy a mac mini" from first principles...

show 3 replies
roadside_picnicyesterday at 10:53 PM

In general if you're setting up a local LLM you should assume it's going to be primarily working as a server and talking to various clients. I use my MBP, but that's because I don't travel much anymore so it can happily work as a server at all times. With the right agent setup you can probably manage most things from your phone even if you don't have a seperate machine to use as a client.

I have an older laptop I run a hermes agent on backed by an API based open (non-local) model and Macbook Pro M4 for running another model locally (also using hermes). The agents have a Mattermost (open source version of slack) server they run and I run Mattermost on my phone so I can talk to them and task them with things. In fact, it was through the hermes WhatsApp endpoint that I got the first agent (non-local) to setup the Mattermost server and unboard the second agent (local mbp).

Then I can just chat with them through Mattermost when I need work done. Whenever I need something done I just hope on the Mattermost server and chat with them. I've had them build me multiple research reports (the fully local agent did awesome at this), learn how to use Stable Diffusion on my desktop to generate images, install and perform maintenance on various local services I run (including Open WebUI).

jtbakeryesterday at 11:53 PM

Nope, have both these machines, can confirm the M5 max blows the M4 mini away. It does get hot, but I use it mostly with an external monitor and keyboard. Conceptually I like the headless model better with a workstation, but work was buying the M5 and can't get it in any other form factor at the monute.

827atoday at 12:33 AM

Apple does not sell a 64GB variant of the M4 Mac Mini. IIRC they never have; its always capped out at 48GB.

If you were planning on getting an M5 128GB; just get a DGX Spark (~$4500) or a 5090-equipped machine (~$4500) plus a Macbook Air (~$1500). You'll come in below the M5 Max 128 pricing (~$6700+ USD) and be happier for it.

show 3 replies
Terrettatoday at 11:43 AM

I have that model, and do local LLMs and local image generation. DO buy this if you plan on serious local LLM use and enjoy working from anywhere.

Don't expect workstation loads with no fan or heatsink, true. But it's not a real problem, it's still quieter than a desktop.

That said, rather than Mac Mini, if you only work from one place, I'd recommend a Studio Ultra M3 with 512GB. Same or more tokens per second, multiple models loaded. Cool and quiet.

lprdtoday at 5:18 PM

Yikes! I've been needing an upgrade, and I was on the fence between a specc'd out MBP, or building out a AI server and delegating tasks to it over Netbird/Tailscale to my homelab.

I'm mainly interested in coding/image creation tasks. Has anyone built out a server for a similar use-case and, if so, whats your experience been? What cards should I be looking into? Am I looking at spending ~10-15k for something that can give me near frontier quality/speed? I know about the DGX Spark/Mac Mini's, but I'd like to be able to upgrade later down the road.

actersyesterday at 7:34 PM

Would the new upcoming AMD AI ryzen halo desktop be a better value offer? or dgx spark?

You would have to get a third party reseller/scalper or refurbished mac mini to get 64gb of ram ever since apple stopped selling it.

show 4 replies
swangyesterday at 7:44 PM

I have an M4 Max and when I was trying out local LLM work with pi it has probably felt like the hottest I've ever felt any kind of Macbook be. I could feel the radiated heat off it even a few inches away. Honestly felt hotter than any Intel Macbook I've used. Because of that I stopped as I didn't want to harm my laptop in case I need to hold it for 10 years due to all the supply issues/price increases.

show 1 reply
Roark66today at 2:13 PM

I think there is no reasonably priced machine you could run locally to do serious work with LLMs...

10x rtx6000 Pro in a large workstation is probably the way to go for someone wanting to run GLM5.2.

Other than that it is cloud.

As good as these small models got we are still not "at breakeven" for me.

What is "breakeven" with LLMs? For me it is when I no longer have to read the actual code it wrote. I can trust that if I told it to implement and document a certain architecture it actually did that with no stupid mistakes.

The first model ever that did that for me was the first opus. 4.4 if I remember correctly.

The second model was Gemini 3 Pro preview. For few weeks. Then it was lobotomised. I guess it was too expensive to run and they quantized it too hell.

Only Opus remains. If this GLM model truly rivals even an old opus I'll be very happy when day comes that I'll be able to run it locally.

HSOtoday at 7:31 AM

running potentially sota open-weight models locally only became a thing in fall 2023.

if a hardware cycle takes ~3 years then fall 2026 would be the first possible device generation where apple exploits its advantage with the unified ram architecture.

more realistically, spring 2027, since they probably also needed some time to make up their minds to lean into that on the top end.

that`s also how i would interpret the recent rumors on m6 and m7.

naturally, the cooling and all that will be optimized around that.

so the first devices that are actually intended and designed for this use case will come at the earliest this fall and more likely in q1/q2 next year.

you are basically paying the price now to be on the bleeding (sweating) edge

somewhatrandom9yesterday at 10:03 PM

Try using DwarfStar 4 and use the --power flag: https://github.com/antirez/ds4#reducing-heat-power-usage-and...

show 1 reply
stiraytoday at 5:07 PM

I am using MacBook Pro M4 with 64GB of RAM and I have it on direct path of air conditioning airflow, 40ish cm from the device, while running LM Studio opened to network. No noise, not hot to the touch.

Using linux for actual work on my workstation.

c7byesterday at 9:06 PM

This. Do consider local LLMs, but set aside a dedicated machine for it. Connect via VPN or reverse proxy. If it's not a Mac them I'd also put a server distro on it. No need for a desktop environment, save your RAM.

show 1 reply
geophileyesterday at 8:30 PM

That's exactly what I'm doing -- Mini M4 Pro 64GB, qwen3.6.

My hearing is not great, but I think I would have noticed the fan, and I have never heard it. In fact, I had to google to find out if it even has a fan.

show 1 reply
oceanplexianyesterday at 7:31 PM

If you want to do coding with a local LLM your best bet is a 6 year old Nvidia 3090 which is substantially more powerful than the highest end overhyped Apple product for 1/5th the price.

show 4 replies
overgardyesterday at 8:56 PM

I'm running an M5 Max 128GB with Qwen 3.6 and unreal engine in the background and it seems to be ok for me. Quite a power drain if it's not plugged in but I haven't seen any thermal issues.

amatechatoday at 2:04 AM

I wonder if that's why there is such a good selection of 128gb M5 MBP's on the Apple Certified Refurbished store lol https://www.apple.com/ca/shop/refurbished/mac/macbook-pro-12...

show 1 reply
blaguitoday at 1:16 PM

So the sweet spot for dev in 2026 is 64k context windows? Are we back in 2024?

As more context will degrade a lot the t/s. On top this is 1 slot.

If you use sub agents the kv cache will be invalidated with colliding request and make it even slower.

So the in real world 256k (the max qwen offer) and using 3-4 slots the numbers are very different.

This is the major issue with so many postes over local models not benchmarking real world use. Real context and not taking this in context.

If you use 1 slot the issue, you loose the ability of using sub agents when exploring and all end up in the main agent context overloading it, triggering compactation and oh boy with 64k context that compecation will be an endless loop.

What tasks you would really be able to do with 64k context 1 agent? For sure so quick edits but not complex planning where you need to ingest a lot files and end up loosing 80% of the ingested files to compactation.

Arubisyesterday at 7:41 PM

Don't forget that your OLED screen will start to color-shift as the heat cooks the panel!

show 1 reply
PeterStuertoday at 7:24 AM

No laptop is thermally designed to handle sustained high workloads. The whole point of a laptop is to keep it thin, quiet and light, the exact opposite of what cooling needs.

trollbridgetoday at 2:24 AM

Or just buy an R9700 and put it in the basement?

b3ingtoday at 2:40 PM

You can use a fan app to ramp up how fast the fans spin instead of the default so you can prevent any throttling

xd1936yesterday at 8:00 PM

Apple does not currently sell a Mac Mini with 64GB RAM.

show 2 replies
Arch-TKyesterday at 11:33 PM

It's okay, completely wrong thread for this statement, but I wouldn't voluntarily use current MacOS (no idea if the older variants weren't terrible) over anything but ssh. Worse than Windows 11.

show 2 replies
toephu2yesterday at 9:36 PM

I just checked apple's website and configured them:

Mac Studio: Ships: 16–18 weeks

Mac mini: Ships: 10–12 weeks

jarek83today at 2:15 PM

You can't buy Mac Mini with 64GB RAM today. Most what you can have is 48GB

staredyesterday at 10:23 PM

Yes, it gets really hot really fast.

As much as I was tempted to use it on longer projects, I had some reservations about whether it would put too much strain on my MacBook.

cosmic_cheeseyesterday at 8:26 PM

They really need to release those updated Studios already.

show 1 reply
cmgbhmyesterday at 8:04 PM

A local model on my m2 made me come to that conclusion but I definitely was having “that config is $2k more” regret. Thanks for posting this!

Matlyesterday at 8:33 PM

> If you want to run Qwen3.6 27B / 35B at its best, get a MacMini M4 with 64GB of RAM and put it in the basement - or at least a few meters from your desk.

Can confirm this works rather well, most things that integrate with LLMs, (agents, editors), support providing a remote (LAN) URL for Ollama, LM Studio etc.

But you do need a fast LAN connection, otherwise working with agents will be a pain.

show 2 replies
Aperockytoday at 9:34 AM

Thank you - I was very close but thanks to chores and availability haven't pulled the trigger. You are very convincing.

SkitterKherpiyesterday at 7:33 PM

I am considering getting something like NVIDIA's RTX Spark when it comes out, though even that will be limited to 128GB.

show 2 replies
seunosewatoday at 3:22 AM

You can get some work done by using low power mode even when plugged in, and making your fan start running when the temps just start to rise (maybe 40 degrees. Use a third party fan app to set it up

bilekasyesterday at 10:17 PM

Can you define "serious programming"? Because I use it to implement things I COULD go and figure out like algorithms or test generation or evaluations etc, the "serious" programming I tend to do myself. That is what I'm paid for.

show 1 reply
zkmontoday at 12:16 PM

The Q6_K gguf fits nicely on a 24GB GPU. That's amazing.

seanmcdirmidyesterday at 8:11 PM

What sort of M5 are you running? A max? MacMini's don't offer max CPUs.

show 1 reply
Abishek_Muthiantoday at 3:24 AM

>Sure you can use it in clamshell mode

Wouldn't this damage the MBP display?

My RTX laptop has air intake underneath the keyboard and clamshell mode is surely a recipe for disaster; I've taken numerous measures to ensure that the laptop doesn't stay awake when the lid is down.

jarjourayesterday at 8:16 PM

TBF, I just recently picked up this same model, and it's reminding me of the last gen Intel i9 MBP. Just visiting any non-basic website spins up the fans and battery life isn't great either. Yes, this thing is fast, but damn it gets hot just using it for normal tasks.

Still, I don't agree. I think this machine is meant to use local models. You just have to wear pants if you want to keep it directly on your lap. I rarely use it that way anyway. I prefer it plugged into an external display and comfortably sitting on a laptop stand.

show 4 replies
kamranjontoday at 8:55 AM

I completely disagree, it is probably the best platform currently for this - and the way I run it is as a server with tailscale accessible from my coding machine (same as you suggest here) - the difference is that you can stop the server, use it as a video editing rig on a whim, or use it for training instead of inference (yes PyTorch has caught up and Metal is a great platform for this now).

It’s just so flexible, and I even use it in agent mode (ds4) directly on the machine as well sometimes (it’s really not that bad, I’m often running inference for small side projects on my couch), if there is another machine that can do all of this and still function as one of the more ergonomic, well built, and compact laptops out there, I’d love to hear what it is cause I’d likely be interested!

verdvermyesterday at 7:30 PM

Get an OEM Spark instead, mine are silent and can fit 2 qwen/gemma at 8bit or give you room for a bunch of other, smaller models (embed,rerank,etc)

m3kw9today at 12:14 PM

Your MacBook will not last running current big LLMs on these hardware. The heat will wear on it.

throwaway240403today at 1:18 AM

No, buy a framework desktop.

pistoriusptoday at 7:13 AM

Mac Mini in the rack and a Neo in the lap.

singpolyma3yesterday at 9:12 PM

With 128 you can run 122b ;)

kelchmtoday at 1:15 PM

This -- with the M5 Max MBP is running flat out, you'll go from full battery to empty in under two hours.

While it is wild to have this much power in a take-it-anywhere laptop form factor, I sort of regret not just going for a Mac Studio + base M5 MBP.

codazodayesterday at 9:03 PM

Today the Mini tops out at 48GB. Gotta go to the Studio to get 64GB.

show 1 reply

🔗 View 11 more replies