It relies on an “unintuitive observation”[0] that you can run batches basically for free (up to a li...

bdcs • yesterday at 2:31 PM • 0 replies • view on HN

It relies on an “unintuitive observation”[0] that you can run batches basically for free (up to a limit). So if you only run one inference, you batch it plus a lot of guesses and, if you guess right, can speed up the inference by the number of guesses. If you guess wrong, you're back to regular speed (and still fully correct).

[0] https://x.com/karpathy/status/1697318534555336961

alt Hacker News