I bet there’s gonna be a banger of a Mac Studio announced in June.
Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?
When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.
It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.
They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.
Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.
https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...
> Apple really stumbled into making the perfect hardware for home inference machines
For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.
> ...making the perfect hardware for home inference machines.
I really don't get why anybody would want that. What's the use case there?
If someone doesn't care about privacy, they can use for-profit services because they are basically losing money, trying to corner the market.
If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc. The point being that you do NOT purchase the GPU or even dedicated hardware, e.g. Google TPU, but rather rent for what you actually need and when the next gen is up, you're not stuck with "old" gen.
So... what use case if left, somebody who is both technical, very privacy conscious AND want to do so offline despite have 5G or satellite connectivity pretty much anywhere?
I honestly don't get who that's for (and I did try a dozens of local models, so I'm actually curious).
PS: FWIW https://pricepertoken.com might help but not sure it shows the infrastructure each rely on to compare. If you have a better link please share back.
DGX workstations, expensive but allow PCI cards as well.
https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...
I'm not a big fan of reducing computing as a whole to just inference. Apple has done quite a bit besides that and it deserves credit. Mac Pro disappearing from the product line is a testament to it, that their compact solutions can cover all needs, not just local inference, to a degree that an expandable tower is not required at all.
CUDA 13 on Linux solves the unified memory problem via HMM and llamacpp. It’s an absolute pain to get running without disabling Secure Boot, but that should be remedied literally next month with the release of Ubuntu 26.04 LTS. Canonical is incorporating signed versions of both the new Nvidia open driver and CUDA into its own repo system, so look out for that. Signed Nvidia modules do already exist right now for RHEL and AlmaLinux, but those aren’t exactly the best desktop OSes.
But yeah, right now Apple actually has price <-> performance captured a lot of you’re buying a new computer just in general.
To me there is a fundamental difference. Even if PC hardware costs slightly more (now because of the RAM situation, Apple producing his chips in house can get better deals of course), it's something that is worth more investing in in.
Maybe you spend 1000$ more for a PC of comparable performance, well tomorrow you need more power, change or add another GPU, add more RAM, add another SSD. A workstation you can keep upgrade it for years, adding a small cost for an upgrade in performance.
An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one. You want a new GPU technology? You have to change the whole thing. And if something inside breaks? You of course throw away the whole computer since everything is soldered on the mainboard.
There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage. True nowadays you can install Linux on it, but the GPU it's not that well supported, thus you loose all the benefits. You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions, etc. If I need a workstation to train LLM why do I care about a OS with a GUI? It's only a waste of resources, I just need a thing that runs Linux and I can SSH into it. Also I don't get the benefit of using containers, Docker, etc.
Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.
Agreed. I’m planning on selling my 512GB M3 Ultra Studio in the next week or so (I just wrenched my back so I’m on bed-rest for the next few days) with an eye to funding the M5 Ultra Studio when it’s announced at WWDC.
I can live without the RAM for a couple of months to get a good price for it, especially since Apple don’t sell that model (with the RAM) any more.
As to better or cheaper homelab: depends on the build. AMD AI Max builds do exist, and they also use unified memory. I could argue the competition was, for a long time, selling much more affordable RAM, so you could get a better build outside Apple Silicon.
The typical inference workloads have moved quite a bit in the last six months or so.
Your point would have been largely correct in the first half of 2025.
Now, you're going to have a much better experience with a couple of Nvidia GPUs.
This is because of two reasons - the reasoning models require a pretty high number of tokens per second to do anything useful. And we are seeing small quantized and distilled reasoning models working almost as well as the ones needing terabytes of memory.
Apple abandoned the pro market long before ever releasing the current iteration of Mac Pro. I doubt they care about getting it back considering its a smaller niche of consumers and probably significantly more investment on the software side.
At best we probably get a chassis to awkwardly daisy chain a bunch of Mac Studios together
The interesting question is whether they'll lean into it intentionally (better tooling, more ML-focused APIs) or just keep treating it as a side effect of their silicon design
For LLMs and other pure memory-bound workloads, but for eg. diffusion models their FPU SIMD performance is lacking.
What part of your workflow relies on home LLM inference?
Just a reminder that the old Intel Mac Pro could handle 1.5TB of RAM ... today's Mac Studio can only handle 0.25TB.
Seem odd that a computer from a decade ago could have more than a 1TB of incremental RAM vs what we can buy today from Apple.
> home inference machines.
The market for this use case is tiny
The new M chips beat basically any PC on video editing. Their new ProRes accelerator chiplet is so good they can’t even compete.
I do love the Mac Studio. I had a 2019 Mac Pro, the Intel cheesegrater, but my home office upstairs became unpleasant with it pushing out 300W+. I replaced it with the M2 Ultra Studio for a fraction of the heat output (though I did had to buy an OWC 4xNVMe bay).
> I bet there’s gonna be a banger of a Mac Studio announced in June. Apple really stumbled into making the perfect hardware for home inference machines.
This I'm not actually as sure about. The current Studio offerings have done away with the 512GB memory option. I understand the RAM situation, but they didn't change pricing they just discontinued it. So I'm curious to see what the next Studio is like. I'd almost love to see a Studio with even one PCI slot, make it a bit taller, have a slide out cover...
Framework offers the AI Ryzen Max with ̶1̶9̶6̶G̶B̶ 128GB of unified RAM for 2,699$
That's a pretty good deal I would think
https://frame.work/de/de/products/desktop-diy-amd-aimax300/c...
Still, running 2 to 4 5090 will beat anything Apple has to offer for both inference and training.
> Apple really stumbled into making the perfect hardware for home inference machines
Apple are winning a small battle for a market that they aren’t very good in. If you compare the performance of a 3090 and above vs any Apple hardware you would be insane to go with the Apple hardware.
When I hear someone say this it’s akin to hearing someone say Macs are good for gaming. It’s such a whiplash from what I know to be reality.
Or another jarring statement - Sam Altman saying Mario has an amazing story in that interview with Elon Musk. Mario has basically the minimum possible story to get you to move the analogue sticks. Few games have less story than Mario. Yet Sam called it amazing.
It’s a statement from someone who just doesn’t even understand the first thing about what they are talking about.
Sorry for the mini rant. I just keep hearing this apple thing over and over and it’s nonsense.
I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.
If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.
Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.
tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.
> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.
Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.