Useful, will try this. One thing I'd love to see in tools like this is a "memory pressur...

javiercarballo • today at 10:09 AM • 0 replies • view on HN

Useful, will try this.

One thing I'd love to see in tools like this is a "memory pressure"view that shows not just current VRAM usage but how close you are to the OOM cliff for the workload you're running. Running quantized LLMs on consumer GPUs (e.g. Q4_K_M Gemma 4 E4B on an 8GB card), you can be at 95% utilization and totally fine, or at 80% and one context spike away from a crash. nvtop and nvidia-smi give you the number but not the headroom.

Whether that's feasible without instrumenting the workload specifically is another question. But it's the metric I actually care about when I'm picking quantization levels.

alt Hacker News