This is a very uncharitable interpretation of the twitter post: "It’s a combination of anthropic’s stance of not doing human reviews or any kind of rational roll out and stabilization."
They mention nothing about agents being used, rather focus on humans in the review cycle and some sort of gated roll-out process. Why we would bin these practices in the name of a faster release cycle is an important question & debate.
I kind of agree, but it goes both ways. Has Jarred said that there was no review? I know that he stated that rust bun passes tests. Now, I don't know the amount or quantity of coverage, but as a thought experiment, let's assume they are good. What does that count for?