Don't trust AI agents

284 points • by gronky_ • today at 12:39 PM • 166 comments • view on HN

Comments

badsectoracula • today at 2:00 PM

> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines.

This reminds me of a very common thing posted here (and elsewhere, e.g. Twitter) to promote how good LLMs are and how they're going to take over programming: the number of lines of code they produce.

As if every competent programmer suddenly forgot the whole idea of LoC being a terrible metric to measure productivity or -even worse- software quality. Or the idea that software is meant to written to be readable (to water down "Programs are meant to be read by humans and only incidentally for computers to execute" a bit). Or even Bill Gates' infamous "Measuring programming progress by lines of code is like measuring aircraft building progress by weight".

Even if you believe that AI will -somehow- take over the whole task completely so that no human will need to read code anymore, there is still the issue that the AIs will need to be able to read that code and AIs are much worse at doing that (especially with their limited context sizes) than generating code, so it still remains a problem to use LoCs as such a measure even if all you care are about the driest "does X do the thing i want?" aspect, ignoring other quality concerns.

➕ show 18 replies

aerhardt • today at 7:18 PM

A question I've been asking myself and which I honestly want to put out there - and I apologize in advance, because you will see me repeat it in other threads, out of genuine curiosity:

Does your life have so much friction that you need a digital agent to act on your behalf?

Some of the use cases I saw on the OpenClaw website, like "checking me into a flight", are non-issues for me.

I work in business automation, but paradoxically I don't think too much about annoyances in my private life. Everything feels rather frictionless.

In business, I see opportunities to solve friction and that's how I make money, but even then, often there are barriers that are very hard to surmount:

(a) problems are complex to solve and require complex solutions such as deterministic or ML systems that LLMs are not even close to being able to create ad-hoc

(b) entrenched processes and incumbent organizations create moats that are hard to cross (ex: LinkedIn makes automation very hard)

I imagine there are similar dynamics in the consumer space, but more than anything, I may not be seeing issues with such a critical eye (I like to relax after work, after all)

So, do you have problems in your private life that you'd want to take on the risks - and friction - of maintaining these agents?

andai • today at 9:10 PM

I move the security boundary one or two layers up: the Unix user (on main machine I run them as a `agent` user, so they can't read or write my files), or even better, just give it a separate machine. (VPSes are now popular for this purpose, as are Mac Minis. My choice is $50 Thinkpad :)

That said I am a fan of Nanoclaw, and especially the philosophy of "it should be small enough to understand, modify and extend itself." I think that's a very good idea, for many reasons.

The idea of giving different agents access to different subsets of information is interesting. That's the Principle of Least Privilege. That seems like a decent idea. Each individual agent can get prompt injected, but the blast radius is limited to what that specific agent has access to.

Still, I find it amusing that people are running this with strict rulesets, in Docker, on a VM, and then they hook it up to their GMail account (and often with random discount LLMs to boot!). It's like, we need to be clear about what the actual threat model is there. It comes down to trust and privacy.

You can start by thinking, "if the LLM were perfectly reliable (not susceptible to random error or prompt injection) and perfectly private (running on my own hardware)", what would you be comfortable letting it do. And then you remove these hypothetical perfect qualities one by one to arrive at what we have now: slightly dodgy, moderately prompt-injectable cloud services. Each one changing the picture in a slightly different way.

I don't really see a solution to the Security/Privacy <-> Convenience tension, except "wait for them to get smarter" (mostly done) and "accept loss of privacy" (also mostly done, sadly!)

buremba • today at 1:27 PM

My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu

1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.

3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.

Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.

➕ show 4 replies

VladVladikoff • today at 1:02 PM

This doesn’t really feel like enough guardrails to prevent the type of problems we’ve seen so far. For example an agent in a single container which has access to an email inbox, can still do a lot of damage if that agent goes off the rails. We agree this agent should not be trusted, yet the ideas proposed as a solution are insufficient. We need a fundamentally different approach.

Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?

➕ show 5 replies

lucrbvi • today at 1:24 PM

Why does OpenClaw have 800,000+ lines of code?? Isn't it just a connector for LLM APIs and other tools?

➕ show 5 replies

Sytten • today at 2:35 PM

I am a caveman, I don't understand the need for a personal assistant. What are you guys using it for?

➕ show 4 replies

justonceokay • today at 3:16 PM

I have twice encountered a phone tree AI agent saying my problem could not be solved and then ending the call. One was for PayPal fraud and the other was for closing an unused bank account.

For right now my trick is to say I have a problem that is more recognizable and mundane to the ai (i .e. lie) and then when I finally get the human just say “oh that was a bunch of hooey here’s what I’m trying to do”. For PayPal that involved asking for help with a business tax that did not exist. For my bank it involved asking to /open/ a new account. Obviously th AI wants to help me open an account, even if my intention is to close one.

That will only work for so long but it’s something

echoangle • today at 2:29 PM

Looking at the NanoClaw GitHub README:

> If you want to add Telegram support, don't create a PR that adds Telegram alongside WhatsApp. Instead, contribute a skill file (.claude/skills/add-telegram/SKILL.md) that teaches Claude Code how to transform a NanoClaw installation to use Telegram.

Why would you want that? You want every user asks the AI to implement the same feature?

➕ show 1 reply

mathgladiator • today at 3:53 PM

I was blown away by OpenClaw until I saw the bill. Ultimately, I think of these ecosystems as personal enhancements and AI costs need to come down dramatically for real problem. Worse, however, is the security theater. I would not want to be the operator for any business built with front-line LLM usage based on a yolo'd agent framework. I'm very happy to use these for silo'd components that are well isolated and have reasonable QA processes (and that can even included agents since now we literally have no excuse to not have amazing test coverage).

Their niche is going to be back office support, but even that creates risk boundaries that can be insurmountable. A friend of mine had a agent do sudo rm -rf ... wtf.

My view is that I want to launch an agent based service, but I'm building a statically typed ecosystem to do so with bounds and extreme limits.

➕ show 1 reply

jswelker • today at 9:13 PM

As a fun thought experiment, when people complain about LLMs, I substitute the word "human" or "employee" into the sentence and see if it is equally true.

"You can never really trust an LLM!" -> "You can never really trust an employee!" (Every IT department ever.)

"LLMs make shit up." -> "Humans make shit up." (Wow very profound insight.)

smallpipe • today at 12:59 PM

Docker is not a security boundary. You’re one prompt injection away from handing over your gmail cookie.

➕ show 1 reply

Eggpants • today at 5:36 PM

I’m using this but using gpt-oss-120B instead of a cloud service. It has been eye opening when I realized the LLM is beings used as a compiler. I asked it to add apple iMessage and apple notes support as I I rather have long responses, like write me a program ideas, not fill my iMessage history. The local LLM, which I believe has limited bash training data, does pretty well.

For example: I enjoy industrial music and asked it for the tour data of the band KMFDM which returned they will be in Las Vegas in April for a festival(Sick new world). This festival has something like 20 bands most of which I never heard of. I asked nanoclaw to search all of the band list and generate a listing grouped by the type of music they play: Industrial, rap, etc. It did a good job based on bands I do know.

I was pleased as I certainly did not want to do 20 band web searches by hand. It’s still at a bar trick level. It gives me hope that an upgraded agent based Siri-like OS component could actually be useful from time to time.

shich • today at 1:07 PM

the trust problem cuts both ways tho — users don't trust agents, but the bigger issue is agents trusting each other. once you have multi-agent pipelines, you're one rogue upstream output away from a cascade. sandboxing individual agents is table stakes; what's actually hard is defining trust boundaries between them

➕ show 1 reply

xrd • today at 1:52 PM

How can I trust this discussion when my browser won't trust their certs?

rdtsc • today at 1:45 PM

> The container boundary is the hard security layer — the agent can’t escape it regardless of configuration

I thought containers were never a proper hard security barrier? It’s barrier so better than not having it, if course.

➕ show 1 reply

Yokohiii • today at 3:02 PM

Why do people take this article serious? It's just a wall of gibberish trying to make the product look more "secure" then others. It's not. It adds shallow secure looking random junk without tackling the core issues. Which are not solvable obviously.

simon_void • today at 6:53 PM

nobody trusts AI agents, that's why they are put in a harness. It's just that I additionally belong to the people who don't trust AI agents to always adhere to harnesses either.

nickdirienzo • today at 3:59 PM

I tried NanoClaw and love the skill (and container by default) model. But having skills generate new code in my personalized fork feels off to me… I think it’s because eventually the “few thousand auditable lines” idea vanishes with enough skills added?

Could skill contributions collapse into only markdown and MCP calls? New features would still be just skills; they’d bring in versioned, open-source MCP servers running inside the same container sandbox. I haven’t tried this (yet) but I think this could keep the flexibility while minimizing skill code stepping on each other.

➕ show 1 reply

raffael_de • today at 6:25 PM

> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies.

Isn't OpenClaw just ...

  while(true) {
    in = read_input();
    if(in) {
      async relay_2_llm(in);
    }
    sleep(1.0);
  }

... and then some?

nkzd • today at 2:18 PM

As someone who only coding agents at work, can someone describe their use case for claw type agent? What do you do with it?

➕ show 1 reply

himata4113 • today at 1:00 PM

My assistant has no permissions at all and is just as useful. All it needs is todo, reminders and websearch (and maybe a browser but ymmv).

➕ show 4 replies

gmerc • today at 3:01 PM

Oh this can be monetized: claw-guard.org/adnet.

Another persons trust issues are your business model.

adithyassekhar • today at 1:02 PM

Really good points about ai making gigantic heaps of code no human can ever review.

It's almost like bureaucracy. The systems we have in governments or large corporations to do anything might seem bloated an could be simplified. But it's there to keep a lot of people employed, pacified, powers distributed in a way to prevent hostile takeovers (crazy). I think there was a cgp grey video about rulers which made the same point.

Similarly AI written highly verbose code will require another AI to review or continue to maintain it, I wonder if that's something the frontier models optimize for to keep them from going out of business.

Oh and I don't mind they're bashing openclaw and selling why nanoclaw is better. I miss the times when products competed with each other in the open.

➕ show 1 reply

vitto_gioda • today at 2:43 PM

"Time to understand 8 minutes" what a non-technical purpose...

ed_mercer • today at 1:50 PM

How is Nanoclaw different from running openclaw in a VM?

Kiboneu • today at 3:33 PM

“If you trust the tool then you’re holding it wrong”

theturtletalks • today at 2:18 PM

Has anyone used:

OpenClaw

NanoClaw

IronClaw

PicoClaw

ZeroClaw

NullClaw

Any insights on how they differ and which one is leading the race?

➕ show 3 replies

spacecadet • today at 2:29 PM

Why this is posted here and is a revelation for anyone, this many years later is indicative of the times. Good bye.

desireco42 • today at 5:08 PM

I think you have issue with your security cert.

bigstrat2003 • today at 5:16 PM

All this talk about sandboxing and permissions misses the obvious: since you can't trust the agents, don't freaking use them. It is utterly stupid to give an LLM access to run things on your computer, because nothing you do can stop it from hallucinating garbage that harms your system. The whole "agent" craze is the most incredible display of irresponsibility I have ever seen in this industry.

➕ show 1 reply

nemo44x • today at 2:06 PM

I’ve seen skills, etc haphazardly being launched with no constraints or guardrails. That more or less have admin access and can take actions that are not reversible.

It’s the monkey with a gun meme.

formerly_proven • today at 12:59 PM

d'uh

SignalStackDev • today at 6:04 PM

[dead]

snowhale • today at 7:01 PM

[dead]

techpulse_x • today at 3:00 PM

[dead]

TeeWEE • today at 1:48 PM

Do you trust your employees? Do you trust a contracter? Do you trust other people?

AI is similar to a person you dont know that does work for you. Probably AI is a bit more trustworthy than a random person.

But a company, needs to let employees take ownership of their work, and trust them. Allow them to make mistakes.

Isnt AI no different?

➕ show 7 replies

alt Hacker News

Don't trust AI agents

Comments