True; everybody's NPU hardware is afflicted by awkward hardware and software constraints...

wtallis • 05/03/2025 • 2 replies • view on HN

True; everybody's NPU hardware is afflicted by awkward hardware and software constraints that don't come close to keeping pace with the rapidly-shifting interests of ML researchers.

To some degree, that's an unavoidable consequence of how long it takes to design and ship specialized hardware with a supporting software stack. By contrast, ML research is moving way faster because they hardly ever ship anything product-like; it's a good day when the installation instructions for some ML thing only includes three steps that amount to "download more Python packages".

And the lack of cross-vendor standardization for APIs and model formats is also at least partly a consequence of various NPUs evolving from very different starting points and original use cases. For example, Intel's NPUs are derived from Movidius, so they were originally designed for computer vision, and it's not at all a surprise that making them do LLMs might be an uphill battle. AMD's NPU comes from Xilinx IP, so their software mess is entirely expected. Apple and Qualcomm NPUs presumably are still designed primarily to serve smartphone use cases, which didn't include LLMs until after today's chips were designed.

It'll be very interesting to see how this space matures over the next several years, and whether the niche of specialized low-power NPUs survives in PCs or if NVIDIA's approach of only using the GPU wins out. A lot of that depends on whether anybody comes up with a true killer app for local on-device AI.

Replies

zozbot234 • 05/03/2025

> It'll be very interesting to see how this space matures over the next several years, and whether the niche of specialized low-power NPUs survives in PCs or if NVIDIA's approach of only using the GPU wins out.

GPU's are gaining their own kinds of specialized blocks such as matrix/tensor compute units, or BVH acceleration for ray-tracing (that may or may not turn out to be useful for other stuff). So I'm not sure that there's any real distinction from that POV - a specialized low-power unit in an iGPU is going to be practically indistinguishable from a NPU, except that it will probably be easier to target from existing GPU API's.

➕ show 1 reply

imtringued • 05/04/2025

Actually, it's a good thing that it's Xilinx IP. The software is nasty to get working, but it is really reliable, because it's used in thousand to ten thousand dollar boards. The cost of writing software for it is way too high though.

alt Hacker News

Replies