The overhead shrinks with larger models. It doesn't seem that bad.
https://arxiv.org/pdf/2409.03992v2