Depending on what you do. If you are doing token generations, compute-dense kernel optimization is l...

liuliu • yesterday at 8:36 PM • 0 replies • view on HN

Depending on what you do. If you are doing token generations, compute-dense kernel optimization is less interesting (as, it is memory-bounded) than latency optimizations else where (data transfers, kernel invocations etc). And for these, Mac devices actually have a leg than CUDA kernels (as pretty much Metal shaders pipelines are optimized for latencies (a.k.a. games) while CUDA shaders are not (until cudagraph introduction, and of course there are other issues).

alt Hacker News