The NYT is comically bad. Most of their (paywalled) articles include the full text in a JSON blob, and that text is typically 2-4% of the HTML. Most of the other 96-98% is ads and tracking. If you allow those to do their thing, you're looking at probably two orders of magnitude more overhead.