Similar recent posting with optimizations for older Xeon:
High-Performance AI on a Budget: Optimizing llama.cpp for Qwen3.5 Inference on a Dual-GPU HP Z440
https://news.ycombinator.com/item?id=47320244