The energy consumed is cv^2f. It makes no sense to keep increasing frequency as you make power way worse.
So heat. There’s efforts to switch to optics which don’t have that heat problem so much but have the problem that it’s really hard to build an optical transistor. + anywhere your interfacing with the electrical world you’re back to the heat problem.
Maybe reversible computing will help unlock several more orders of magnitude of growth.
At lower frequencies, leakage current plays a larger role than gate capacitance, so for any given process node, there's a sweet spot. For medium to low loads, it takes less power to rapidly switch between cutting off power to a core, and running at a higher frequency than is needed, than to run at a lower frequency.
Newer process nodes decrease the per-gate capacitance, increasing the optimal operating frequency.