logoalt Hacker News

fc417fc802yesterday at 6:49 AM1 replyview on HN

> isn't a perfect fit for these algorithms but it's relatively close

I don't think that's true. The best fit out of what's presently available perhaps. Inference is almost entirely memory bandwidth bound at present, to the extent that GPUs with HBM have a massive advantage over those with GDDR. TPUs appear to be a much better overall design.

I expect that a hypothetical advance in fabrication enabling processing elements to be placed directly adjacent to dense RAM on the same silicon (not merely in the same package) would be superior in all regards.


Replies

Dylan16807yesterday at 5:31 PM

> I expect that a hypothetical advance in fabrication enabling processing elements to be placed directly adjacent to dense RAM on the same silicon (not merely in the same package) would be superior in all regards.

Processing scales better than DRAM does. I think an HBM-like stack where the bottom layer has the math units is probably the ultimate form of that.

And it's possible that flash instead of DRAM is actually the better play, as long as you can hook up enough in parallel. RIP Optane.