logoalt Hacker News

LatencyKillstoday at 12:54 PM1 replyview on HN

Not OP but I worked through Sebastian Raschka's "Build a Large Language Model (From Scratch)" [0] and Raj Abhijit Dandekar's "Build a DeepSeek Model (From Scratch)" [1] books.

I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

[0]: https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

[1]: https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...


Replies

hackinthebochstoday at 1:14 PM

>I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

If you're up for it I would love to know how and why positional encodings work

show 2 replies