has anyone come across an r2d3-style explainer for something as high-dimensional as a Transformer's attention mechanism?
Not quite, but these help
https://poloclub.github.io/transformer-explainer/
https://youtu.be/wjZofJX0v4M?si=gT8Zlz1IY14KV_ju
Not quite, but these help
https://poloclub.github.io/transformer-explainer/
https://youtu.be/wjZofJX0v4M?si=gT8Zlz1IY14KV_ju