Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA ...

adastra22 • last Saturday at 4:41 AM • 2 replies • view on HN

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

my123 • last Saturday at 6:48 PM

Not for prompt processing. Current Macs are really not great at long contexts

alt Hacker News