True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in pa...

zozbot234 • yesterday at 9:24 PM • 1 reply • view on HN

True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in parallel. That probably makes offload somewhat more effective. And you also have RAM caching available as a natural possibility.

Replies

bigyabai • yesterday at 9:34 PM

You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.

alt Hacker News

Replies