That's good information. I couldn't possibly even start to run even DeepSeek Flash on my system, but also if you're assuming multiple GPUs, that is going to affect the napkin math.
The point is that tok/s/GPU stays ~roughly stable. So you need say 4 GB200s minimum to fit the modules, but this provides 4x the tok/s as 1 GPU.
The point is that tok/s/GPU stays ~roughly stable. So you need say 4 GB200s minimum to fit the modules, but this provides 4x the tok/s as 1 GPU.