The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.
For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.
With the M6 theoretically coming later this year, Apple seems to be realizing they need to catch up with more lanes of GPU.