Curious which models are you able to run and how many 3090s do they require at scale?
4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b
4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b