logoalt Hacker News

mlmonkeyyesterday at 9:55 PM1 replyview on HN

"Attention" is just a matmul. Q = KV/sqrt(d) etc.

I don't see how any planning is done in latent space. Can you point me to any papers? Thanks.

Edit: Oh, I see you're probably talking about CoCoNuT? Do all frontier models us it nowadays?


Replies

orbital-decaytoday at 4:26 AM

There's a lot of research on this topic. https://arxiv.org/abs/2303.08112 and https://arxiv.org/abs/2311.04897 are just two examples that come to mind