logoalt Hacker News

gchamonliveyesterday at 1:36 PM0 repliesview on HN

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.