For open-weights models, censorship removal is now a "solved" problem. If you wait a few d...

hleszek • yesterday at 6:11 PM • 2 replies • view on HN

For open-weights models, censorship removal is now a "solved" problem. If you wait a few days after a new model release, someone will have made a heretic ( https://github.com/p-e-w/heretic ) version with the censorship removed, so in a way the only use for censorship now is to avoid lawsuits, not reduce improper usage.

Replies

jakkos • yesterday at 6:17 PM

Any time I've tried an "abliterated" model, heretic or other, it has always damaged the capabilities of the original model and will still often refuse or produce garbage at a lot of "unsafe" requests.

➕ show 1 reply

avazhi • today at 1:05 AM

The problem is the heretic and abliteration versions are dog shit quality compared to the non-edited versions and much more likely to hallucinate.

AFAIK abliteration without quality reduction isn’t even possible without some quality reduction, even if it’s marginal. All the benchmarks reflect this.

alt Hacker News

Replies