This is an incredibly silly comparison. It amounts to claiming that a Ford Pinto is just as good of a car as a Rolls-Royce by simply observing that both cars got a person from point A to point B. After all, once someone reaches their destination you can hardly tell what vehicle they actually used to get there, but that doesn't mean there's no difference between vehicles.
What matters most in state of the art models isn't simply the final destination, it's the process of how one arrives to that destination.
In my test the prompt was the same and all suggestions were auto accepted so indeed there was no difference other than model and harness. The amount of characters typed and interaction with the harnesses were exactly the same.
I think your analogy makes the opposite case better. A Rolls-Royce and a Pinto have the same real commute time because horsepower isn't the bottleneck, and they both get passengers from point to point. Sure the Pinto explodes a bit but much like the actuaries at Ford, you might well judge the cost of an occasional explosion to be a trade-off you can easily compensate for.
I would argue the process these days has more to do with the harness than the model, at least when we're talking about the SOTA options. Claude Code's biggest advantage isn't Opus, rather it's the shared knowledge the community has been building and sharing around using it effectively. Almost all of the out-of-the-box tutorials and skills and frameworks are build for Claude first, then Codex maybe.
I'd go further and say that CC and Codex are not even the best harnesses available, they just offer the most subsidized rate plans.