logoalt Hacker News

ActivePatternyesterday at 3:30 PM0 repliesview on HN

The win is in how many weights you process per instruction and how much data you load.

So it's not that individual ops are faster — it's that the packed representation lets each instruction do more useful work, and you're moving far less data from memory to do it.