If you split models using pipeline/layer parallelism you don't have to care about a slow i...

zozbot234 • today at 6:10 PM • 0 replies • view on HN

If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much.

alt Hacker News