One thing is a model that's trained from the start to say "This topic is above my pay grad...

nine_k • today at 2:46 AM • 1 reply • view on HN

One thing is a model that's trained from the start to say "This topic is above my pay grade" to any mention of the status of Taiwan, etc.

Quite another is an architecture where the big model is not mutilated, but is gaslighted. A different, simpler model checks the incoming prompt and alters it if it contains banned topics. Another simpler model checks the output and censors it if it contains banned topics.

I bet a similar architecture is already deployed, e.g. to fight porn, planning of crimes, etc. But it can be turned into a dynamic system that provides controllable different answers (including unhelpful or misleading answers) based on geography, language, browser fingerprints, or the current political climate. All this could happen undetectedly and gradually if desired.

Welcome to a cyberpunk dystopia.

Replies

MichaelZuo • today at 2:51 AM

This level of censorship kinda does make even Soviet or Maoist censors look like a honest straightforward bunch in comparison.

A very ironic result from a company supposedly valuing the opposite.

➕ show 1 reply

alt Hacker News

Replies