Check out the new QWEN coder model.
Also, isnt there different affinities to 8bit vs 4bit for inferences