Batching lowers that, since the model is read once from memory. Activation accumulation doesn't...

eurekin • yesterday at 8:22 PM • 0 replies • view on HN

Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely

alt Hacker News