In an optimized implementation of model inference, the latency of SSD access has no importance, beca...

adrian_b • last Monday at 7:40 PM • 0 replies • view on HN

In an optimized implementation of model inference, the latency of SSD access has no importance, because no random accesses are done.

The purpose of optimizing model inference for weights stored on SSDs is to achieve a continuous reading from SSDs at the maximum throughput provided by hardware, taking care that any computations and any accesses to the main memory are overlapped over the SSDs reading.

alt Hacker News