I've also been thinking about this. Although the forward pass of a transformer model also invol...

yuriyguts • yesterday at 9:03 PM • 0 replies • view on HN

I've also been thinking about this. Although the forward pass of a transformer model also involves some heavier operations like normalization, reciprocals, exponentiations or other non-linearities (GeLU, SiLU) which may (though typically don't) involve learned weights as operands.

alt Hacker News