OpenAI has been the absolute worst about this, historically. I found myself having to change my que...

silisili • yesterday at 8:18 AM • 1 reply • view on HN

OpenAI has been the absolute worst about this, historically. I found myself having to change my queries because it refused to serve things it deemed insensitive.

Replies

gck1 • yesterday at 8:50 AM

Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output.

Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.

But my trust towards OAI is also brittle - what if they also do it, or start doing it?

I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.

➕ show 1 reply

alt Hacker News

Replies