logoalt Hacker News

lxgryesterday at 11:03 AM2 repliesview on HN

It works well so far, for you.

Are you confident it would still work against sophisticated prompt injection attacks that override your "strongly worded message"?

Strongly worded signs can be great for safety (actual mechanisms preventing undesirable actions from being taken are still much better), but are essentially meaningless for security.


Replies

unshavedyakyesterday at 3:46 PM

Not sure about OPs impl, but the wording doesn’t matter. The hook prevents the use of whatever action you want. Eg it’s impossible for Claude to use Emojis for me. My hook doesn’t allow it.

So it’s deterministic based upon however the script it written

esperentyesterday at 12:07 PM

I mean, that's like saying are you sure that your antivirus would prevent every possible virus? Are you sure that you haven't made some mistake in your dev box setup that would allow a hacker to compromise it? What if a thief broke i to your house and stole your laptop? That's happened to me before, much more annoying to recover from that an accidental rm rf.

I do my best to keep off site back ups and don't worry about what I can't control.

show 1 reply