logoalt Hacker News

embedding-shapelast Friday at 4:00 PM1 replyview on HN

> Super fast inference

How fast is "super fast" exactly, and with what runtime+model+quant specifically? Curious to see how how 4x 3090s compare to 1x Pro 6000, could probably put together 4x 3090s for a fraction of the cost compared to the Pro 6000, but the times I've seen the tok/s in/out for multiple GPUs my heart always drops a little.


Replies

mips_avatarlast Friday at 4:43 PM

I haven't benchmarked against a pro 6000, it's more that i have 4 3090s and i don't have a pro 6000.

show 1 reply