The experimental SSD streaming feature (author's demo @ https://x.com/antirez/status/2062536214675067322 - recently merged into the main branch) is great news for that project, allowing for SOTA inference (DeepSeek V4 Flash and Pro!) on RAM-limited machines. Now we need work on large-ish scale batching in order to recover tok/s under the SSD streaming scenario. It's not helpful when running normally (at least not on Apple Silicon) since thermal/power throttling is the constraint in that case, but SSD streaming is a whole other consideration.