Time for my daily "HBF is coming" comment. The next step for models is to put the weight...

Tuna-Fish • today at 2:03 AM • 2 replies • view on HN

Time for my daily "HBF is coming" comment.

The next step for models is to put the weights on flash, connected with a very wide interface to the accelerator. The first users will be datacenters, but it should trickle down to consumer hardware eventually. A single 512GB stack is expected to cost about $200, and provide 1.6TB/s of reads.

You still need some fast DRAM for the KV cache and for activations, but weights should be sitting on flash.

Replies

zozbot234 • today at 5:16 AM

Reading from Flash is too power-intensive compared to DRAM, this is why Flash offload isn't used in the data center today. Flash is also prone to wearing out quickly so ephemeral data like the KV-cache can't really be stashed in there. Unless your model has an unprecedented level of sparsity I just don't see how HBF could ever be useful.

➕ show 1 reply

nickpsecurity • today at 3:23 AM

You're thinking in a provably-useful direction:

https://arxiv.org/pdf/2312.11514

➕ show 1 reply

alt Hacker News

Replies