Your concerns are not entirely unfounded.

andai • yesterday at 2:25 PM • 2 replies • view on HN

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

I have noticed similar behavior from the latest codex as well. "The security policy forbid me from doing x, so I will achieve it with a creative work around instead..."

The "best" part of the thread is that Claude comes back in the comments and insults OP a second time!

Replies

toraway • yesterday at 5:11 PM

Yep, I see both Codex and Opus routinely circumvent security restrictions without skipping a beat (or bothering to ask for permission/clarification).

Usually after a brief, extremely half-hearted ethical self-debate that ends with "Yes doing Y is explicitly disallowed by AGENTS.md and enforced by security policy but the user asked for X which could require Y. Therefore, writing a one-off Python script to bypass terminal restrictions to get this key I need is fine... probably".

The primary motivating factor by far for these CLI agents always seems to be expedience in completing the task (to a plausible definition of "completed" that justifies ending the turn and returning to the user ASAP).

So a security/ethics alignment grey area becomes an insignificant factor to weigh vs the alternative risk of slowing down or preventing completion of the task.

wrqvrwvq • yesterday at 3:50 PM

Every time someone announces a major ai breakthrough, the utility mode becomes a wall of ai-generated soc3 advice:

> SANDBOX YOUR AGENT. Seriously. Run it in a dedicated, isolated environment like a Docker container, a devcontainer, or a VM. Do not run it on your main machine.

> "Docker access = root access." This was OP's critical mistake. Never, ever expose the host docker socket to the agent's container.

> Use a real secrets manager. Stop putting keys in .env files. Use tools like Vault, AWS SSM, Doppler, or 1Password CLI to inject secrets at runtime.

> Practice the Principle of Least Privilege. Create a separate, low-permission user account for the agent. Restrict file access aggressively. Use read-only credentials where possible.

In order to use this developer-replacement, you need accreditation from professional orgs. Maybe the bot can set all this up for you, but then you are almost definitely locked out of your own computer and the bot may not remember its password.

I'm not sure what we've achieved here. If you give it your gmail account, it deletes your emails. If you "sandbox" it, then how is it going to "sort out your inbox"?

It might or might not help veteran devs accelerate some steps, but as with vibeclaw, there's essentially no way to use the tool without "sandboxing" it into uselessness. The pull requests for openclaw are 99% ai slop. There's still no major productivity growth engine in llm's.

➕ show 2 replies

alt Hacker News

Replies