some of this is what's khronos standards are theoretically supposed to achieve.
surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.
maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...
> it's not a coincidence that metal is much easier to program for
Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.