logoalt Hacker News

cornholioyesterday at 7:20 PM1 replyview on HN

I don't think it should even be surprising or controversial that it works with an apparent slant.

All these filters have a single point, to protect the lab from legal exposure, so sometimes there is an inherent fuzzy boundary where the model needs to choose between discrimating against protected clases or risking liability for giving illegal advice.

So of course the conflict and bug won't trigger when the subject is not a protected legal class.


Replies

rtkwetoday at 3:49 AM

The point is I'm not sure it's novel and not just a PC flavored version of the classic role play jail break that's never really stopped working on these models. If it'd stopped working definitively maybe it'd be more convincing that it's a novel type that uses the guardrails against one another but afaik they never defnitively patched the RP jail breaks.