logoalt Hacker News

benkaisertoday at 2:06 AM0 repliesview on HN

Often there are "abliterated" or "uncensored" tuned models that suppress the rejections. From my high level understanding it is performed by finding which weights activate for the rejection and lowering those so the model is less likely to reject. It doesn't fix if the model doesn't know what you're asking it though (i.e. if the model never actually learned about meth production in the first place).