logoalt Hacker News

cyanydeezyesterday at 10:44 PM1 replyview on HN

the models will never avoid egregious behavior. think of it like every "good intentions" morality tale. theres almost always some geniune context where that behavior is wanted.

instead, the coding harness or determinative tool, will need hardcoded security features.

in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.

you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.

MCPs dont address much either.

want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.

then, the user and LLM can the negotiate assets and actions as needed via the protocol.

but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.


Replies

madroxyesterday at 11:55 PM

I think you're choosing to ignore what I said about the implication of durable workflows, because you seem to be inventing some stories about my comment.

I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.

Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.