Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory bandwidth using the CPU for compute, never mind the GPU. Adding dark silicon for some special operations isn't going to make out memory bandwidth faster.
Are we going to see more memory channels for consumer desktop at some point from AMD or Intel? Apple seems to be the only one that offers it.
NPUs are more useful for prefill than decode anyway. Memory bandwidth is not the bottleneck for prefill.
Does a cache help with inference workloads anyway?
I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.