logoalt Hacker News

exabrialtoday at 3:03 AM1 replyview on HN

The way to do it _today_ requires enormous amounts of HBM! However, we've never designed inference accelerators, which is actually a quite "trivial" problem, but we've just never had a need.

Groq (acqui-hired by NVidia) came up with a different processor architecture: metric shit-tons of SRAM attached to a modest single core deterministic processor. No HBM needed on this card, and 32x faster inference than today's best GPUs at inference!

These LPUs are pretty useless for training though, which is useful for companies training models! Training is expensive, inference is cheap (someday, not now).

There's also a Canadian company that _literally burned the model as a silicon mask_ on a chip. It's unbelievably (1000x) fast, but not flexible of course: https://chatjimmy.ai


Replies

kcbtoday at 3:13 AM

The point is metric shit-tons of SRAM is still large amounts of expensive memory.