logoalt Hacker News

Tuna-Fishtoday at 2:03 AM2 repliesview on HN

Time for my daily "HBF is coming" comment.

The next step for models is to put the weights on flash, connected with a very wide interface to the accelerator. The first users will be datacenters, but it should trickle down to consumer hardware eventually. A single 512GB stack is expected to cost about $200, and provide 1.6TB/s of reads.

You still need some fast DRAM for the KV cache and for activations, but weights should be sitting on flash.


Replies

zozbot234today at 5:16 AM

Reading from Flash is too power-intensive compared to DRAM, this is why Flash offload isn't used in the data center today. Flash is also prone to wearing out quickly so ephemeral data like the KV-cache can't really be stashed in there. Unless your model has an unprecedented level of sparsity I just don't see how HBF could ever be useful.

show 1 reply
nickpsecuritytoday at 3:23 AM

You're thinking in a provably-useful direction:

https://arxiv.org/pdf/2312.11514

show 1 reply