The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that...

bigyabai • last Wednesday at 6:15 PM • 1 reply • view on HN

The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.

For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.

Schiendelman • last Wednesday at 11:30 PM

With the M6 theoretically coming later this year, Apple seems to be realizing they need to catch up with more lanes of GPU.

➕ show 1 reply

alt Hacker News