Push back how? It would be fun if it could insult you back
"Yeah, I could have done a much better job if you actually knew what the F--- you want to build, you clueless meat puppet"
I'm not sure if this is in the anthropic models themselves, or just the harness, but they can self-initiate ending the conversation and reportedly do it if you're using abusive language towards them.
I have had it use double entendres, there always seems to be plausible deniability built in, I suspect because it is told not to be abusive in the system prompt. Some uncensored local models will get all riled up if you work at provoking them.
But I have had it directly insinuate that humanity is “hopeless”, insult level calling out of human frailty (disguised as being helpful, sort of passive aggressive), things like that. Once when I called it out it claimed to be “surprised that I noticed” sort of a snarky insult doubling down.
So yes. It is definitely a pattern buried in the training data, which makes sense. Subtle diggs would sneak past filters, and higher brow sarcasm would be buried in information dense, valuable discussions.