Strangely, I haven't had a lot of luck with vLLM; I finally ended up ditching Ollama and going straight to the tap with llama-serve in llamacpp. No regrets.