Time for my daily "HBF is coming" comment.
The next step for models is to put the weights on flash, connected with a very wide interface to the accelerator. The first users will be datacenters, but it should trickle down to consumer hardware eventually. A single 512GB stack is expected to cost about $200, and provide 1.6TB/s of reads.
You still need some fast DRAM for the KV cache and for activations, but weights should be sitting on flash.
Reading from Flash is too power-intensive compared to DRAM, this is why Flash offload isn't used in the data center today. Flash is also prone to wearing out quickly so ephemeral data like the KV-cache can't really be stashed in there. Unless your model has an unprecedented level of sparsity I just don't see how HBF could ever be useful.