>HOW NVIDIA GPUs process stuff? (Inefficiency 101) Wow. Massively ignorant take. A modern GPUs ...

moralestapia • yesterday at 7:34 AM • 3 replies • view on HN

>HOW NVIDIA GPUs process stuff? (Inefficiency 101)

Wow. Massively ignorant take. A modern GPUs is an amazing feat of engineering, particularly about making computation more efficient (low power/high throughput).

Then proceeds to explain, wrongly, how inference is supposssedly implemented and draws conclusions from there ...

Replies

beAroundHere • yesterday at 7:43 AM

Hey, Can you please point out explain the inaccuracies in the article?

I had written this post to have a higher level understanding of traditional vs Taalas's inference. So it does abstracts lots of things.

wmf • yesterday at 8:31 AM

Arguably DRAM-based GPUs/TPUs are quite inefficient for inference compared to SRAM-based Groq/Cerebras. GPUs are highly optimized but they still lose to different architectures that are better suited for inference.

imtringued • yesterday at 10:49 AM

The way modern Nvidia GPUs perform inference is that they have a processor (tensor memory accelerator) that directly performs tensor memory operations which directly concedes that GPGPU as a paradigm is too inefficient for matrix multiplication.

alt Hacker News

Replies