logoalt Hacker News

bluerooibostoday at 12:39 PM2 repliesview on HN

Since you spent a month digging into this, can you recommend any materials/projects to look into to get a decent grasp of how they work?


Replies

malwrartoday at 5:12 PM

I’d recommend my method of just drawing out the block diagram and drawing out + digging into the math at each step! I’m the kind of person who needs to take time to ask lots of questions before stuff clicks, and if you are too I strongly recommend it.

I picked it up from trying to teach myself that SLAM stuff. The papers are very short, but highly information dense and at the time there was no ChatGPT to help me. I got through them by just creeping my way through the math with a whiteboard, and something about drawing it out and having it there in my office made it all click. Trying to watch piecemeal lectures on YouTube or grind through foundational books like MVG just didn’t work for me, I used them instead as references for my drawings.

Same happened when I tried learning this GPT stuff. karpathy’s videos were out at the time, but I couldn’t really stay focused on them or connect the math with the code. Most other descriptions I could find were focused on getting you to use their inference library or harness. Assembling the picture together on my whiteboard by focusing on drawing out the block diagram continues to be my personal favorite method for deep understanding of complex systems.

LatencyKillstoday at 12:54 PM

Not OP but I worked through Sebastian Raschka's "Build a Large Language Model (From Scratch)" [0] and Raj Abhijit Dandekar's "Build a DeepSeek Model (From Scratch)" [1] books.

I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

[0]: https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

[1]: https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...

show 1 reply