logoalt Hacker News

ttulyesterday at 9:45 PM5 repliesview on HN

If you do the math (I did), in 2 years, open source models that you can run on a future MacBook Pro will be as capable as the frontier cloud models are today. Memory bandwidth is growing rapidly, as is the die area dedicated to the neural cores. And all the while, we have the silicon getting more power efficient and increasingly dense (as it always does). These hardware improvements are coming along as the open source models improve through research advancements. And while the cloud models will always be better (because they can make use of as much power as they want to - up in the cloud), what matters to most of us is whether a model can do a meaningful share of knowledge work for us. At the same time, energy consumption to run cloud infrastructure is out-pacing the creation of new energy supply, which is a problem not easily solved. I believe scarcity of energy will increasingly drive frontier labs toward power efficiency, which necessarily implies that the Pareto frontier of performance between cloud and local execution will narrow.


Replies

nltoday at 12:41 AM

A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

To run a 8 bit quantized version of that you need roughly 5TB of RAM.

Today that is around 18 NVidia B300. That's around $900,000, without including the computers to run them in.

It's true that the capability of open source models is improving, but running actual frontier models on your MPB seems a way off.

[1] https://x.com/elonmusk/status/2042123561666855235?s=20 (and Elon has hired enough people out of those labs to have a fair idea)

show 9 replies
npuntyesterday at 10:49 PM

I did this calculation a bit ago and don't think frontier models are just a few MacBook Pro generations away. Yes numbers reliably go up in tech in general but in specific semiconductors & standards have long lead-times and published roadmaps, so we can have high confidence in what we're getting even in 3-4 years in terms of both transistor density and RAM speeds.

In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.

Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.

Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw

show 2 replies
xorcistyesterday at 11:13 PM

That's not "math". That's a "wild guess", or baseless extrapolation at best.

show 1 reply
CMaytoday at 7:07 AM

So long as you don't require deep search grounding like massive web indexes or document stores which are hard to reproduce locally. You can do local agentic things that get close or even do better depending on search strategy, but theoretically a massive cloud service with huge data stores at hand should be able to produce better results.

In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.

I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.

rc1yesterday at 10:09 PM

Show your working / explain your math?

show 1 reply