logoalt Hacker News

jakogutyesterday at 4:32 PM3 repliesview on HN

You in fact can now! In the past week, a transformer framework called FastFlowLM [0] supporting XDNA 2 NPUs officially started supporting Linux.

I posted it here the same day I found and started using it, to almost no reaction.

[0] https://github.com/FastFlowLM https://fastflowlm.com/ https://huggingface.co/FastFlowLM


Replies

giancarlostoroyesterday at 6:21 PM

> to almost no reaction.

HN is overloaded with AI stuff, its hard to break through all the noise. I say this as someone very interested in AI. Even I skip some links because its just too much.

wing-_-nutsyesterday at 9:05 PM

I see it making claims about 10x efficiency, but how is tokens / second / watt? The only machines I've seen with the memory bandwidth to effectively do local inference are Mx arm chips on mac.

vyryesterday at 5:05 PM

because it's not faster than the Ryzen 395's GPU. power efficiency doesn't matter as much as TTFT for desktop users, especially when they're tasking their AMD box as a dedicated inference machine.

some older pre-395 AMD articles suggested it'd be possible to use the NPU for prefill and the GPU for decoding and this would be faster than using either alone, but we have yet to see that (even on Windows) for any usefully sized models, just toys like LLaMA-8B.