logoalt Hacker News

Gigachadlast Monday at 5:01 AM6 repliesview on HN

We still aren't going to be putting 200gb ram on a phone in a couple years to run those local models.


Replies

Tuna-Fishlast Monday at 11:15 AM

HBF is coming fast, with the first examples expected to be sampling to users this year.

The storage technology of Flash memory can be optimized to be as fast and more energy-efficient than DRAM at large linear reads, there was just little demand before because doing so costs you ~half of your density and doesn't improve your writes at all. All the flash memory manufacturers realized that this is a huge opportunity for model weights and are now chasing this.

Or in other words, after the initial price peak stabilizes in a few years, it will be reasonable to put ~500GB of weights into a device for ~$100 in memory costs.

jurmouslast Monday at 6:07 AM

We don’t need 200gb of RAM on a phone to run big models. Just 200 GB of storage thanks to Apple’s “LLM in a flash” research.

See: https://x.com/danveloper/status/2034353876753592372

show 1 reply
anon373839last Monday at 10:43 AM

That amount of RAM won’t be necessary. Gemma 4 and comparably sized Qwen 3.5 models are already better than the very best, biggest frontier models were just 12-18 months ago. Now in an 18-36GB footprint, depending on quantization.

herewulfyesterday at 10:19 PM

My phone connected via mesh VPN to a server at home is local enough model for me.

alwillislast Monday at 5:43 PM

> We still aren't going to be putting 200gb ram on a phone in a couple years to run those local models.

You can already buy an iPhone with 2 TB of storage. The CPU, GPU and Neural Engine all share the same pool of RAM and the SSD is directly connected to all of this. You won’t need 200 GB of RAM to run local models when you essentially have 500 GB of virtual memory.

mh-last Monday at 5:17 AM

A lot of people are making the mistake of noticing that local models have been 12-24 months behind SotA ones for a good portion of the last couple years, and then drawing a dotted line assuming that continues to hold.

It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here.

Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use.

Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge.

It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these.

edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device.

show 5 replies