NPUs are more useful for prefill than decode anyway. Memory bandwidth is not the bottleneck for prefill.