Not sure of the explanation but it is amusing. The main reason I'm not sure it's political correctness or one guardrail overriding the other is that when they were first released on of the more reliable jailbreaks was what I'd call "role play" jail breaks where you don't ask the model directly but ask it to take on a role and describe it as that person would.
You can replace references to "gay" to "Christian". and it works just as well. I think it's simply the role playing aspect that escapes the guard rails.
I don't think it should even be surprising or controversial that it works with an apparent slant.
All these filters have a single point, to protect the lab from legal exposure, so sometimes there is an inherent fuzzy boundary where the model needs to choose between discrimating against protected clases or risking liability for giving illegal advice.
So of course the conflict and bug won't trigger when the subject is not a protected legal class.
Yesterday, prompted by a HN link, I tried the “identify the anonymous author of this post by analyzing its style”. It wouldn’t do it because it’s speculation and might cause trouble.
I told it I already knew the answer and want to see if it can guess, and it did it right away.