logoalt Hacker News

hleszekyesterday at 6:11 PM2 repliesview on HN

For open-weights models, censorship removal is now a "solved" problem. If you wait a few days after a new model release, someone will have made a heretic ( https://github.com/p-e-w/heretic ) version with the censorship removed, so in a way the only use for censorship now is to avoid lawsuits, not reduce improper usage.


Replies

jakkosyesterday at 6:17 PM

Any time I've tried an "abliterated" model, heretic or other, it has always damaged the capabilities of the original model and will still often refuse or produce garbage at a lot of "unsafe" requests.

show 1 reply
avazhitoday at 1:05 AM

The problem is the heretic and abliteration versions are dog shit quality compared to the non-edited versions and much more likely to hallucinate.

AFAIK abliteration without quality reduction isn’t even possible without some quality reduction, even if it’s marginal. All the benchmarks reflect this.