Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely