logoalt Hacker News

ashirviskasyesterday at 11:11 PM1 replyview on HN

What? Training is not inference. Reading books is not the same as writing.


Replies

cookiengineertoday at 1:10 AM

Maybe read up on how transformers, their encoders and decoders, and the attention matrix works?

https://arxiv.org/abs/1706.03762