Making Deep Learning Go Brrrr from First Principles (2022)

118 points • by tosh • today at 11:50 AM • 43 comments • view on HN

Comments

Deep learning is just glorified linear algebra. Master the progression: Feed-forward CNN RNN LSTM Attention. You don't even need a GPU to understand the climax; Karpathy’s llama2.c implements a full transformer inference engine in just ~300 lines of C using SIMD pragmas for CPU execution.

➕ show 1 reply

ollin • today at 4:31 PM

This post is a classic! Also recommended: Horace also gave a related talk (covering the high-level picture of modern ML Systems) at Jane Street in Dec 2024 https://www.youtube.com/watch?v=139UPjoq7Kw

ThouYS • today at 5:42 PM

I feel like there is no portable advice for performance. A torch model exported as onnx is a different model.

That onnx model run using onnxruntime with cuda ep is a different model than the one run with TRT ep.

And even among the same runtime, depending on the target hardware and the memory available during tuning, the model behaves differently. It is a humongous mess

tosh • today at 12:23 PM

> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

➕ show 6 replies

big-chungus4 • today at 4:30 PM

How does x.cos().cos() work faster than doing two cos calls separately? Like the first cos call returns a tensor either way, the only difference is that it's not assigned to a variable. But how is it even possible know that difference in python?

➕ show 2 replies

noosphr • today at 12:24 PM

>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292

➕ show 1 reply

axpy906 • today at 4:32 PM

Needs 2022 in title

jdw64 • today at 12:41 PM

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

➕ show 3 replies

alt Hacker News

Making Deep Learning Go Brrrr from First Principles (2022)

Comments