> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS
wild
This statement makes zero sense
re comments:
yes of course this is apples to oranges but that's kind of the point
it shows the vast span between specialized hardware throughput IFF you can use an A100 at its limit vs overhead of one of the most popular programming languages in use today that eventually does the "same thing" on a CPU
the interesting thing is why that is so
CPU vs GPU (latency vs throughput), boxing vs dense representation, interpreter overhead, scalar execution, layers upon layers, …
Which, lets be honest, is probably still being orchestrated by Python somewhere.
Python is 9.75 million times faster than Python.
Single core vs multi core accounts for much of this
Why are we comparing a programing language and a GPU. This is a category error. Programing languages do not do any operations. They perform no FLOPs, they are the thing the FLOPs are performing.
"The I7-4770K and preform 20k more Flops than C++" is an equally sensible statement (i.e. not)