You're assuming consistent hardware & software profiles. The way these things work at scale is essentially a compiler/instruction scheduling problem where you can think of different CPU/GPU combinations as the pipelines for what is basically a data center scale computer. The function graph is broken up into parts, compiled for different hardware profiles w/ different kernels, & then deployed & stitched together to maximize hardware utilization while minimizing cost. Service providers are not doing this b/c they want to but b/c they want to be profitable so every hardware cycle that is not used for querying or optimization is basically wasted money.
You'll never get agreement from any major companies on your proposal b/c that would mean they'd have to provide a real SLA for all of their customers & they'll never agree to that.