logoalt Hacker News

snovv_crashyesterday at 6:30 AM3 repliesview on HN

Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory bandwidth using the CPU for compute, never mind the GPU. Adding dark silicon for some special operations isn't going to make out memory bandwidth faster.


Replies

bjackmanyesterday at 8:14 AM

Does a cache help with inference workloads anyway?

I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.

show 1 reply
zigzag312yesterday at 11:43 AM

Are we going to see more memory channels for consumer desktop at some point from AMD or Intel? Apple seems to be the only one that offers it.

show 2 replies
zozbot234yesterday at 12:08 PM

NPUs are more useful for prefill than decode anyway. Memory bandwidth is not the bottleneck for prefill.