> One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance?
Is this a worthwhile question when it’s a fundamental security issue with LLMs? In meatspace, we fire Alice and Bob if they fail too many phishing training emails, because they’ve proven they’re a liability.
You can’t fire an LLM.
It's a fundamental issue I agree.
But we don't stop using locks just because all locks can be picked. We still pick the better lock. Same here, especially when your agent has shell access and a wallet.
with openclaw... you CAN fire an LLM. just replace it with another model, or soul.md/idenity.md.
It is a security issue. One that may be fixed -- like all security issues -- with enough time/attention/thought&care. Metrics for performance against this issue is how we tell if we are going to correct direction or not.
There is no 'perfect lock', there are just reasonable locks when it comes to security.
Yes, it’s worthwhile because the new models are being specifically trained and hardened against prompt injection attacks.
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.