> the dense 9B fits on a single 80GB GPU
Us mere mortals cannot use this.
Seems weird. A 9B model would normally fit unquantised on a 24GB GPU.
There are already quantizations available
Seems weird. A 9B model would normally fit unquantised on a 24GB GPU.