logoalt Hacker News

DonHopkinsyesterday at 7:06 PM0 repliesview on HN

I built this. It's a skill called skill-snitch, like an extensible virus scanner + Little Snitch activity surveillance for skills.

It does static analysis and runtime surveillance of agent skills. Three composable layers, all YAML-defined, all extensible without code changes:

Patterns -- what to match: secrets, exfiltration (curl/wget/netcat/reverse shells), dangerous ops, obfuscation, prompt injection, template injection

Surfaces -- where to look: conversation transcripts, SQLite databases, config files, skill source code

Analyzers -- behavioral rules: undeclared tool usage, consistency checking (does the skill's manifest match its actual code?), suspicious sequences (file write then execute), secrets near network calls

Your Thompson point is the right question. I ran skill-snitch on itself and ~80% of findings were false positives -- the scanner flagged its own pattern definitions as threats. I call this the Ouroboros Effect. The self-audit report is here:

https://github.com/SimHacker/moollm/blob/main/skills/skill-s...

simonw's prompt injection example elsewhere in this thread is the other half of the problem. skill-snitch addresses it with a two-phase approach: phase 1 is bash scripts and grep. Grep cannot be prompt-injected. It finds what it finds regardless of what the skill's markdown says. Phase 2 is LLM review, which IS vulnerable to prompt injection -- a malicious skill could tell the LLM reviewer to ignore findings. That's why phase 1 exists as a floor. The grep results stand regardless of what the LLM concludes, and they're in the report for humans to read. thethimble makes the same point -- prompt injection is unsolved, so you can't rely on LLM analysis alone. Agreed. That's why the architecture doesn't.

Runtime surveillance is the part that matters most here. Static analysis catches what code could do. Runtime observation catches what it actually does. skill-snitch composes with cursor-mirror -- 59 read-only commands that inspect Cursor's SQLite databases, conversation transcripts, tool calls, and context assembly. It compares what a skill declares vs what it does:

  DECLARED in skill manifest:  tools: [read_file, write_file]
  OBSERVED at runtime:         tools: [read_file, write_file, Shell, WebSearch]
  VERDICT: Shell and WebSearch undeclared -- review required
If a skill says it only reads files but makes network calls, that's a finding. If it accesses ~/.ssh when it claims to only work in the workspace, that's a finding.

To vlovich123's point that nobody knows what to do here -- this is one concrete thing. Not a complete answer, but a working extensible tool.

I've scanned all 115 skills in MOOLLM. Each has a skill-snitch-report.md in its directory. Two worth reading:

The Ouroboros Report (skill-snitch auditing itself):

https://github.com/SimHacker/moollm/blob/main/skills/skill-s...

cursor-mirror audit (9,800-line Python script that can see everything Cursor does -- the interesting trust question):

https://github.com/SimHacker/moollm/blob/main/skills/cursor-...

The next step is collecting known malicious skills, running them in sandboxes, observing their behavior, and building pattern/analyzer plugins that detect what they do. Same idea as building vaccines from actual pathogens. Run the malware, watch it, write detectors, share the patterns.

I wrote cursor-mirror and skill-snitch and the initial pattern sets. Maintaining threat patterns for an evolving skill malware ecosystem is a bigger job than one person can do on their own time. The architecture is designed for distributed contribution -- patterns, surfaces, and analyzers are YAML files, anyone can add new detectors without touching code.

Full architecture paper:

https://github.com/SimHacker/moollm/blob/main/designs/SKILL-...

skill-snitch:

https://github.com/SimHacker/moollm/tree/main/skills/skill-s...

cursor-mirror (59 introspection commands):

https://github.com/SimHacker/moollm/tree/main/skills/cursor-...