This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained...

orbital-decay • yesterday at 9:41 PM • 0 replies • view on HN

This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained to interpret anything sent under the user role as an instruction, not necessarily in a straightforward manner. Even if you train it to refuse or dodge some inputs (which they do), it's going to affect model's response, often in subtle ways, especially in a multiturn convo. Anthropic themselves call this the character drift.

alt Hacker News