I see it making claims about 10x efficiency, but how is tokens / second / watt? The only ...

wing-_-nuts • yesterday at 9:05 PM • 0 replies • view on HN

I see it making claims about 10x efficiency, but how is tokens / second / watt? The only machines I've seen with the memory bandwidth to effectively do local inference are Mx arm chips on mac.

alt Hacker News