If 8 x RTX 6000 is getting you 20s before initial token, how are cloud vendors doing this?
RTX6000s are great but they are several times slower than a real datacenter-grade GPU. They still use DDR memory rather than HBM, for example.
RTX6000s are great but they are several times slower than a real datacenter-grade GPU. They still use DDR memory rather than HBM, for example.