logoalt Hacker News

embedding-shapelast Friday at 5:00 PM1 replyview on HN

Yes, that's why I'm asking you what exactly 4 3090s get in prompt-processing and generation, sorry if I was unclear.


Replies

mips_avatarlast Friday at 7:29 PM

Maxes out around 4K tok/s output. Each pair of 3090s has its own instance of the model with parallelism across the nvlink bridge. Though nvlink is only 2x over pcie5