logoalt Hacker News

sunaookamilast Friday at 9:55 PM1 replyview on HN

Awesome and very interesting posts, thanks for sharing! Always reminds me of the "lethal trifecta": https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/


Replies

veganmosfetyesterday at 9:48 AM

You're welcome!

My main takeaway message is: models (even opus4.6) do not follow security "instructions" reliably. In OpenClaw, they added security warnings, tags, random IDs... None of these countermeasures work reliably. Even sandboxing can be escaped (not in the classical sense using vulnerabilities, but using multi-layered prompt injection payload with natural language only)[0]. As soon as untrusted content is injected in the context, do not trust any actions downstream.

[0] https://itmeetsot.eu/posts/2026-02-15-openclaw_sandbox/