> that if used as a coding agent, an AI model affiliated with a country's government might try to make my software susceptible to attacks by that government's intelligence forces.
Note that if such a trigger were to exist, the behavior has to be completely reproducible by definition, e.g. when put into the right setting with the right input context, the model starts behaving maliciously with at least some well-defined probability. I don't think any such incident has ever been described, it's a purely theoretical concern.
Such incidents have been extensively described. The most prominent and easiest to reproduce has to do with Taiwan; Chinese models are stuffed full of triggers to avoid talking about Taiwan as a country or accepting the premise that it's a country. Try asking Deepseek about country code +886!
I don't think it's a stretch that you can train/align a model to avoid "hatespeech" or other topics deemed $Unacceptable you can align a model to favor a certain ideological viewpoint and have that alignment subtly influence the output.
How do most Chinese models handle Tienanmen square or discussions on Han superiority?