If you're talking about APIs and SDKs, whether direct API calls or driving tools like Claude code or codex with human out of the loop, I think that's actually fairly straightforward to switch between the various tools.
If you're talking about output quality, then yeah, that's not as easy. But for product outputs (building a customer service agent or something like that), having a well-designed eval harness and doing testing and iteration can get you some degree of convergence between the models of similar generations. Coding is similar (iterate, measure), but less easy to eval.