What kind of hardware are you planning to run this on? As mentioned already, I've been trying to understand how gracefully it might degrade on 64GB RAM or perhaps lower (the total weights size is 80GB at the provided quant) using SSD offload for the weights, and then (assuming it works and doesn't just OOM) whether the tok/s figures might meaningfully improve in that scenario by running multiple sessions in parallel.
What kind of hardware are you planning to run this on? As mentioned already, I've been trying to understand how gracefully it might degrade on 64GB RAM or perhaps lower (the total weights size is 80GB at the provided quant) using SSD offload for the weights, and then (assuming it works and doesn't just OOM) whether the tok/s figures might meaningfully improve in that scenario by running multiple sessions in parallel.