In very crude terms, AFAICT , if you have a bunch of matrix multiplications, but one of matric...

nine_k • last Wednesday at 8:50 PM • 3 replies • view on HN

In very crude terms, AFAICT, if you have a bunch of matrix multiplications, but one of matrices (the one with model weights) doesn't change, you can seriously speed up the computation. One thing is that you don't need to re-fetch the elements of the constant matrix, you can keep it near the ALUs. Then you maybe can detect and ignore sparse / empty blocks by marking them once.

IDK how the custom hardware exploits this; would love to hear any ideas!

Replies

guyomes • last Wednesday at 9:30 PM

> IDK how the custom hardware exploits this; would love to hear any ideas!

You might like this article [1], titled "FPGA-based CNN Acceleration using Pattern-Aware Pruning". More context and details can be found in the PhD thesis of Léo Pradels [2].

[1]: https://inria.hal.science/hal-04689673/document

[2]: https://theses.hal.science/tel-05021575v1/file/PRADELS_Leo.p...

cm2187 • last Wednesday at 11:51 PM

Random thought. Once models stabilise, could you possibly hardcode the model in gates? Or are they too large for a single chip?

➕ show 2 replies

fulafel • yesterday at 3:49 PM

Current accelerators (TPUs, various onchip NPUs) are something close to this. Systolic array is the estabilished computer architecture term for flowing data from computation to computation without the overhead of a register file or von Neumann bottleneck.

alt Hacker News

Replies