logoalt Hacker News

hajileyesterday at 8:07 PM0 repliesview on HN

SIMD workloads on CPU tend to be bursty. If your workload is all SIMD with few other instructions or branches, it's almost certainly going to be faster on a GPU or SME co-processor.

If there's space between the SIMD instructions, then double-pumping or even quad-pumping isn't very expensive (and with 6 SIMD ports, it might even be basically free).