logoalt Hacker News

spencer9714yesterday at 10:52 PM0 repliesview on HN

Interesting concept. One thing I’m curious about if I’m in a cohort for something like DeepSeek V3 and another user spins up a heavy 24/7 job, how do you keep TTFT from degrading? vLLM’s continuous batching helps, but there’s still a physical limit with shared VRAM/compute. I’ve been grappling with this exact 'noisy neighbor' issue while building Runfra. We actually ended up moving toward a credit per task model on idle GPUs specifically to avoid that resource contention entirely.

Curious how you’re thinking about isolation here. Is there any hard guarantee on a 'slice' of the GPU, or is it mostly just handled by the vLLM scheduler?