Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory ba...

snovv_crash • yesterday at 6:30 AM • 3 replies • view on HN

Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory bandwidth using the CPU for compute, never mind the GPU. Adding dark silicon for some special operations isn't going to make out memory bandwidth faster.

Replies

bjackman • yesterday at 8:14 AM

Does a cache help with inference workloads anyway?

I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.

➕ show 1 reply

zigzag312 • yesterday at 11:43 AM

Are we going to see more memory channels for consumer desktop at some point from AMD or Intel? Apple seems to be the only one that offers it.

➕ show 2 replies

zozbot234 • yesterday at 12:08 PM

NPUs are more useful for prefill than decode anyway. Memory bandwidth is not the bottleneck for prefill.

alt Hacker News

Replies