logoalt Hacker News

27183today at 1:49 AM0 repliesview on HN

Given the size of the datacenter class GPUs they're running these models on, don't they need to be processing multiple tenants concurrently per GPU to extract the full potential of the hardware?

I agree, shuffling the data between the CPU and GPU is itself fraught with peril. It's all the hairiest distributed systems problems combined with the sketchiest memory safety issues all in one place.