Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.
What percent of the time do you think the average consumer computing device spends fully clocked up, let alone fully saturated on every core?