If 8 x RTX 6000 is getting you 20s before initial token, how are cloud vendors doing this?

alfiedotwtf • yesterday at 6:44 AM • 1 reply • view on HN

RTX6000s are great but they are several times slower than a real datacenter-grade GPU. They still use DDR memory rather than HBM, for example.

alt Hacker News