I think many would assume "not enterprise" or "not datacenter grade" when someon...

embedding-shape • today at 11:10 AM • 1 reply • view on HN

I think many would assume "not enterprise" or "not datacenter grade" when someone says "Standard GPUs", but maybe that specific phrase have a specific meaning I'm not familiar with.

Edit: I just tried a 4B model on a RTX Pro 6000, getting ~500 tok/s with llama.cpp not even trying to optimize or change anything, just default settings. I'm sure with vLLM it'd be a lot faster already, still before manually tuning configs. I wouldn't call that card "Standard GPU" either FWIW, but it makes the claimed performance numbers feel not as exciting, especially given the hardware they were using.

Replies

ismailmaj • today at 11:13 AM

I expected a 4090, maybe 2. I did not expect 8xH200 for a 2B model.

➕ show 1 reply

alt Hacker News

Replies