Not OP but I worked through Sebastian Raschka's "Build a Large Language Model (From Scratc...

LatencyKills • today at 12:54 PM • 1 reply • view on HN

Not OP but I worked through Sebastian Raschka's "Build a Large Language Model (From Scratch)" [0] and Raj Abhijit Dandekar's "Build a DeepSeek Model (From Scratch)" [1] books.

I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

[0]: https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

[1]: https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...

Replies

hackinthebochs • today at 1:14 PM

>I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

If you're up for it I would love to know how and why positional encodings work

➕ show 2 replies

alt Hacker News

Replies