I wonder what will be the parallel hindsight about waste, but for matrix multiplications, in a few years.
The economic incentives line up much better there. You charge for tokens -> cost is GPUs -> you work very hard to keep GPUs utilized 100% and get max tokens out of those cycles.
Compare this to essentially any modern business app, the product being sold has very little relationship with CPU cycles, or the CPU cycles are SO cheap relative to what you're getting paid, no one cares to optimize.
By then I understand that matrix multiplication will have cured cancer and invented unlimited free energy, so no hindsight of waste needed.