I have personally seen AI bypass this multiple times.

digitaltrees • today at 5:36 AM • 2 replies • view on HN

Replies

giancarlostoro • today at 6:32 AM

Sounds like they're still giving the model the keys to the kingdom, which is my point, stop giving the model the avenue to do catastrophic mistakes, it makes no sense.

➕ show 1 reply

Terr_ • today at 5:52 AM

We kinda need to architect things with the assumption that all token-output from an LLM can be unpredictably sneaky and malicious.

Alas, humans suck at constant vigilance, we're built to avoid it whenever possible, so a "reverse centaur" future of "do what the AI says but only if you see it's good" is going to suck.

➕ show 1 reply

alt Hacker News

Replies