logoalt Hacker News

cousin_ityesterday at 7:42 PM1 replyview on HN

Yeah. Even more than that, I think "prompt injection" is just a fuzzy category. Imagine an AI that has been trained to be aligned. Some company uses it to process some data. The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries. Pick your poison.


Replies

WarmWashyesterday at 8:18 PM

We want a human level of discretion.

show 2 replies