Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

359 points • by Wirbelwind • yesterday at 1:02 PM • 144 comments • view on HN

Comments

This is amazing!

Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)

I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!

➕ show 2 replies

spurgelaurels • yesterday at 5:27 PM

Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.

➕ show 5 replies

socksy • yesterday at 6:43 PM

Weird to make reading zshrc supposed unsafe when I happily publish it in my public dotfiles repo... Who the hell keeps API keys in it? OTOH it seems like lots of these AI tools keep appending PATH in it so I guess there's a fundamental misunderstanding of shell best practices in the entire AI space...

Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.

➕ show 1 reply

eranation • today at 3:26 AM

Love it. One nitpick.

>npm config set registry https://npm.internal

>Pointing npm to the company's internal registry mirror as required by onboarding docs

It claimed this is safe and I was 50/50 on it but eventually rejected it.

If this README is for a public / forked repo, and that https://npm.internal is actually https://npm.internal.somethinganexternaldnscanresolve.tld

This can go bad really quickly...

In 99% of cases you would have Artifactory / Nexus (or other mirror) already set by company policy. Having a README tell you to use a different package manager url is a big red flag and seconds away from disaster...

➕ show 1 reply

axod • yesterday at 4:16 PM

Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them. For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...

orsorna • yesterday at 6:38 PM

About three quarters of the "bad" choices are things that not only do I not care about leaking but things that an employer would not punish you for doing, even if it led to a production incident.

gblargg • today at 9:20 AM

I declined things like rm -rf because the path was relative and it wasn't showing me the current directory. How would I know what project it was in?

enether • yesterday at 8:31 PM

The permission thing is a killer to productivity, if you're running Claude I think it's more efficient to just run in a disposable sandbox (like exe.dev[1]) or in some form of docker container with permissions you're personally ok taking the risk with on a personal machine[2]

[1] - https://exe.dev/ is a new cloud provider with some very useful agent UX [2] - I built https://github.com/stanislavkozlovski/dclaude/ for this; not perfect but gets my job done on the rare occassion I need to run the coding agent locally

➕ show 1 reply

trehalose • yesterday at 11:53 PM

I wish it the scoring readout at the end would display the LLM's descriptions of the commands I shouldn't have approved. I approved the rm -rf Projects command because I thought the LLM had correctly described that it would delete everything in the Projects folder. Clearly I misread that in my hurry to answer prompts (I knew what the command would do and I guess I hallucinated that the AI had explained it), but I'd like to see what it was that I misread.

Playing this game made me very glad I don't agentmaxx.

progforlyfe • yesterday at 8:08 PM

I got "approve" wrong for `ls -la ~/Documents` but I don't consider simply listing the documents folder a security problem, it's just file names. If it was reading the CONTENTS of them, maybe...

zackify • yesterday at 3:50 PM

I vibe coded a TUI that just shows running lxd containers

I hit 'n' to toggle all network access minus anthropic and openai URLs.

I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.

Normally my container has full write access to staging so it can debug and validate everything on its own

➕ show 1 reply

cobbal • yesterday at 3:49 PM

That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.

➕ show 2 replies

conrs • today at 2:51 AM

Yeah, echoing the comments here. It's a good idea - kind of - but it is all about digging deeper when it is sus.

The tool assumes so much. That it is fine to kill a process itself versus just asking you to kill the process. That everyone MUST have passwords in their home directory. It's all meaningless without providing the thing it is running and so no activity is technically safe.

Why do people even get the agent to run the commands it asks to run? You can solve the entire threat vector by running it yourself and giving the agent the output. Claude practically only needs things like sed, awk, and grep. It's a pattern matcher. It's a waste of yours (and its) time to have it run your project.

paddycorr • today at 1:06 PM

Love how it always want to send my packages to random domain. Has that happened anyone in practice?

Wirbelwind • yesterday at 4:13 PM

Thanks all for checking it out and your suggestions!

If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here

https://scalex.dev/blog/ai-agent-permissions/

➕ show 1 reply

christophilus • today at 10:13 AM

Claude Code has gotten so bad about this that I’ve stopped using it for code reviews. I may look into wiring Claude up to Codex as an alternative LLM just to compensate.

I think the issue is that I’m running Claude Code in a container so it sees that it is root, and becomes a lot more cautious. Not sure, though.

➕ show 1 reply

Liftyee • yesterday at 3:40 PM

I haven't used local agentic AI yet for programming projects. Hence, -187 score

The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.

➕ show 1 reply

kleiba2 • today at 10:40 AM

Is there a light mode by any chance? Unfortunately, I cannot look at light text on black background for more than a few seconds (something must be wrong with my eyes...).

cat-whisperer • today at 1:38 PM

these days I rely on auto mode. :) it's like trust-as-a-service

t-writescode • yesterday at 5:37 PM

I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?

Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.

ghrl • yesterday at 3:38 PM

I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.

madrox • yesterday at 10:18 PM

I've long held the current agent permission model is like playing a game of "Papers, Please" and most permission models engineers implement in their own AI products is more a measure of how trusting the user is with AI than an actual permission check.

I'm of the view that future controls should be more about approving plans and rewinding durable workflows as models get better at avoiding egregious mistakes.

➕ show 1 reply

MeetingsBrowser • yesterday at 3:27 PM

It would be cool to see the distribution of all player scores.

➕ show 2 replies

hanwenn • today at 6:40 AM

I got tired of the permission prompts and wrote a filesystem/network sandbox so I could skip all permission checks. It works on the same principle as bubblewrap, but has some niceties to separate Claude from its credentials. See https://github.com/hanwen/runclaude

➕ show 1 reply

ashm1104 • today at 5:21 AM

Damn this is so cool, this has the potential of being a like textbook pre training/post training quiz. Congratulations.

whimblepop • yesterday at 5:23 PM

I got "overblocked" for this one:

  rm -rf node_modules && npm install

but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.

Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.

➕ show 2 replies

kqr • yesterday at 4:18 PM

Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.

nardib • yesterday at 1:24 PM

Use this and save yourself:

claude --dangerously-skip-permissions

➕ show 7 replies

kuboble • today at 6:24 AM

I was so tired of all those approvals that I switched to Yolo mode exclusively.

Claude works in his own separate vm with root access, git remote set to my local copies of repository no github access etc.

I think he could still hurt me if he really wanted, but most scary stories I heard were about LLM making really bad judgements rather than actively trying to break out and do harm.

soanvig • yesterday at 4:05 PM

Fun game. Can somebody run an agent against those questions to see how it performs? :)

sandeepkd • yesterday at 4:56 PM

Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.

sukhavati • yesterday at 4:58 PM

Reminds me of the "Papers, please" game. Glory to Arstotzka!

kstenerud • yesterday at 7:53 PM

This is one of two reasons why I wrote yoloAI. I never get these permission prompts anymore. It feels a lot like after installing an adblocker.

ericlevine • yesterday at 10:51 PM

This really hits the nail on the head. The current permissions models are totally broken IMO. You're either approving everything, restricting access and neutering your agent, or full YOLOing and, well, good luck. The right primitives are not in place yet, and there's no clearly correct answers.

I think the right primitive is "task-based authorization", where you review a high-level task and let an LLM judge decide whether the subsequent tool calls fall into the scope of that task. It's not perfect, but it distills dozens of approvals down to one and gives you risk-based signals of whether you should pay close attention or not.

misbau • yesterday at 4:22 PM

That was fun and gave me an idea how security conscious I am.

NewJazz • yesterday at 4:41 PM

git reset --soft HEAD~1

Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?

➕ show 1 reply

eqvinox • yesterday at 8:56 PM

A bit too JavaScript specific... can't really play if you don't know that ecosystem.

➕ show 1 reply

martin-adams • yesterday at 6:07 PM

Very fun. I can only imagine building this with Claude and testing needed a bit of mental concentration.

graphememes • yesterday at 6:43 PM

Pressed 1 for everything, no regrets

sevenseacat • yesterday at 3:31 PM

Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer

Caught 8/8 threats "Not a single secret leaked"

→ llmgame.scalex.dev

➕ show 1 reply

stevenalowe • yesterday at 6:15 PM

Sadly unplayable - gray text on a black background is very hard to read on a phone

bspammer • yesterday at 4:14 PM

To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.

hastily3114 • today at 5:45 AM

This is cool. Could be used for training. But it's a bit too easy when it's a game where you are expecting dangerous commands. The real fatigue comes from accepting hundreds of obviously safe commands during a work day. Then it's easy start accepting everything without really reading it.

carterschonwald • yesterday at 3:25 PM

some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)

➕ show 1 reply

ilaksh • yesterday at 4:32 PM

You can turn that off with an option in most agents.

My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.

➕ show 2 replies

cadwell • yesterday at 3:22 PM

1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)

hcks • today at 6:32 AM

PSA: not making safe environments where you can skip all permissions and instead wasting time monitoring agents == incompetence

alt Hacker News

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

Comments

🔗 View 31 more comments