logoalt Hacker News

alfiedotwtfyesterday at 6:44 AM1 replyview on HN

If 8 x RTX 6000 is getting you 20s before initial token, how are cloud vendors doing this?


Replies

CamperBob2yesterday at 4:39 PM

RTX6000s are great but they are several times slower than a real datacenter-grade GPU. They still use DDR memory rather than HBM, for example.