Doesn't Windows already do this by default? I can already run models bigger than my GPU VRAM and it will start using up to 50% of my system RAM as "shared memory". This is on a Desktop PC without a shared memory architecture.
The nvidia windows driver enables RAM swapping by default.
Great way to backstab you if you prefer inference speed.
I don't think Windows does this, but Ollama does
The nvidia windows driver enables RAM swapping by default.
Great way to backstab you if you prefer inference speed.