> Also, note that there's zero CUDA dependency.
So does this mean I can run this on AMD? And on a consumer 9000 series card?
If you found a rare 9000 card with 200+ GB of VRAM, sure
If the card supports vulkan and the model has gguf weights. llamacpp has excellent vulkan support that is being actively developed and is not that far behind CUDA where speed is concerned.
If you don't have the source code then it makes no difference. If you have the weights and are running some model via llama.cpp, then you are using whatever API llama.cpp is using, not the API that was used to train the model or that anyone else may be using to serve it.