logoalt Hacker News

p1esktoday at 2:33 PM4 repliesview on HN

A100 FP32 throughput “at its limit”: 19.5 TFLOP/s.

AMD EPYC 9965 FP32 throughput “at its limit”: 41.2 TFLOP/s (192 cores x 64 FP32 FLOP/cycle/core x 3.35GHz).


Replies

aesthesiatoday at 7:45 PM

That's also a CPU that came out four years later than the A100. The contemporaneous B200 is not optimized for FP32 and does 74.45 TFLOP/s. For FP16 it's at ~2 PFLOP/s.

show 1 reply
zzzoomtoday at 7:13 PM

EPYC 9965: 614GBps of 12-channel DDR5-6400

A100: 1935GBps of HBM2e

Most of those FLOPS are constrained by memory bandwidth.

show 1 reply
toshtoday at 2:46 PM

A100: 312 TFLOP/s for FP16

but it is very impressive how far modern CPUs get as well (also in smart phones!)

show 1 reply