logoalt Hacker News

raincoleyesterday at 10:19 PM1 replyview on HN

Censorship on image generation models works on another level. The models can generate NSFW, but there are extra computer vision models checking if the images can be shown to the users. It's especially obvious for Grok and ChatGPT.


Replies

BoorishBearsyesterday at 11:33 PM

There are image models with censorship at every stage from pretraining to posttraining.

Most recently Ideogram released an open weight model that will denoise into a grey image with the text "Blocked by safety filter" notice for certain prompts

Of course, because it's open weights people have found defeats