You're welcome! My main takeaway message is: models (even opus4.6) do not follow security &qu...

veganmosfet • yesterday at 9:48 AM • 1 reply • view on HN

You're welcome!

My main takeaway message is: models (even opus4.6) do not follow security "instructions" reliably. In OpenClaw, they added security warnings, tags, random IDs... None of these countermeasures work reliably. Even sandboxing can be escaped (not in the classical sense using vulnerabilities, but using multi-layered prompt injection payload with natural language only)[0]. As soon as untrusted content is injected in the context, do not trust any actions downstream.

[0] https://itmeetsot.eu/posts/2026-02-15-openclaw_sandbox/

Replies

cornholio • yesterday at 9:39 PM

What do you think about CaMeL and similar approaches?

https://simonwillison.net/2025/Apr/11/camel/

➕ show 1 reply

alt Hacker News

Replies