https://www.github.com/antirez/ds4 (from Antirez of Redis fame) runs a 2-bit quant on Apple Silicon hardware and 96GB or 128GB RAM.
I've been keeping an eye on Antirez's Metal fork for llama.cpp, but I totally missed this. Whoa, nice. Giving it a go, thanks!!
I've been keeping an eye on Antirez's Metal fork for llama.cpp, but I totally missed this. Whoa, nice. Giving it a go, thanks!!