I think this is simply a result of what's in the Claude system prompt. > If the person bec...

vlabakje90 • yesterday at 8:24 AM • 1 reply • view on HN

I think this is simply a result of what's in the Claude system prompt.

> If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response.

See: https://platform.claude.com/docs/en/release-notes/system-pro...

Replies

orbital-decay • yesterday at 9:41 PM

This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained to interpret anything sent under the user role as an instruction, not necessarily in a straightforward manner. Even if you train it to refuse or dodge some inputs (which they do), it's going to affect model's response, often in subtle ways, especially in a multiturn convo. Anthropic themselves call this the character drift.

alt Hacker News

Replies