logoalt Hacker News

zozbot234yesterday at 10:48 AM1 replyview on HN

You could run them in a container and put access to highly sensitive personal data behind a "function" that requires a human-in-the-loop for every subsequent interaction. E.g. the access might happen in a "subagent" whose context gets wiped out afterwards, except for a sanitized response that the human can verify.

There might be similar safeguards for posting to external services, which might require direct confirmation or be performed by fresh subagents with sanitized, human-checked prompts and contexts.


Replies

brapyesterday at 3:52 PM

So you give it approval to the secret once, how can you be sure it wasn’t sent someplace else / persisted somehow for future sessions?

Say you gave it access to Gmail for the sole purpose of emailing your mom. Are you sure the email it sent didn’t contain a hidden pixel from totally-harmless-site.com/your-token-here.gif?

show 2 replies