Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output.
Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.
But my trust towards OAI is also brittle - what if they also do it, or start doing it?
I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.
What kind of work are you getting refusals on? Genuinely curious. The only refusal I’ve had in recent memory was declining to find doorbell camera footage matching a certain description, which is fair enough and I think EU laws heavily restrict such activities (even tho I’m not in the EU)