logoalt Hacker News

the_afyesterday at 12:59 PM0 repliesview on HN

But they must have received this fine-tuning, right?

Otherwise it's hard to explain why they follow these negations in most cases (until they make a catastrophic mistake).

I often test this with ChatGPT with ad-hoc word games, I tell it increasingly convoluted wordplay instructions, forbid it from using certain words, make it do substitutions (sometimes quite creative, I can elaborate), etc, and it mostly complies until I very intentionally manage to trip it up.

If it was incapable of following negations, my wordplay games wouldn't work at all.

I did notice that once it trips up, the mistakes start to pile up faster and faster. Once it's made a serious mistakes, it's like the context becomes irreparably tainted.