The way modern Nvidia GPUs perform inference is that they have a processor (tensor memory accelerato...

imtringued • yesterday at 10:49 AM • 0 replies • view on HN

The way modern Nvidia GPUs perform inference is that they have a processor (tensor memory accelerator) that directly performs tensor memory operations which directly concedes that GPGPU as a paradigm is too inefficient for matrix multiplication.

alt Hacker News