An then the S3 team ships some internal code changes wich results in subtle difference in how the API behaves in edge cases and you "emulator" is wrong again.
Not really, the main thing they could change is the order in which errors are checked when there are multiple errors, but even a lot of that is not likely to change, eg html based stuff comes first. There are actual reasons for a lot of the subtleties, plus they have to have a stable contract for all the success cases as people rely on it. And its not hard to update, I have 800 or so tests against AWS that run in a few minutes.
Actually no, because they actually have to worry about their millions of customer deployments not randomly breaking every other week. So, they are generally very good about not breaking compatibility unless they have a really good reason to. And when they do, they typically give people years of time to deal with any backwards compatibility breaking changes. Because otherwise they are dealing with lots of support overhead and angry customers. That and the multiple independently developed s3 compatible clients or alternatives make this probably the easiest thing to simulate in the AWS universe as it is probably the most widely used thing in AWS.