logoalt Hacker News

adrian_blast Monday at 7:40 PM0 repliesview on HN

In an optimized implementation of model inference, the latency of SSD access has no importance, because no random accesses are done.

The purpose of optimizing model inference for weights stored on SSDs is to achieve a continuous reading from SSDs at the maximum throughput provided by hardware, taking care that any computations and any accesses to the main memory are overlapped over the SSDs reading.