logoalt Hacker News

Terr_today at 7:08 AM0 repliesview on HN

I worry that "boiling" is still optimistic, since it isn't as simple or foolproof. It's more like a complex fermentation process, where it's possible for a malicious input to hijack how it works and generate something more dangerous than what you put in.

Even if the output is only shown to a human, imagine a comment in a thread that tricks an LLM into "summarizing" a false account where other innocent people said terrible ban-worthy things.