Your local model is still going to get prompt-injected by third parties if it has an Internet connec...

zahlman • yesterday at 5:07 PM • 1 reply • view on HN

Your local model is still going to get prompt-injected by third parties if it has an Internet connection. It just isn't regularly phoning home to Google/Anthropic/etc. but tons of other people would be interested in your data (or convincing the model to encrypt your home directory). There's also still no real accountability anywhere. Even if you have the resources to train the model from scratch yourself, it's not like you can audit the weights and understand any potential malicious behaviour encoded in there, beyond the baseline of "yeah these things are kinda unpredictable".

And on the flip side, a remote model isn't creating risk in and of itself. That comes from the agent harness being permitted to make network and filesystem calls. Even the most evil possible version of ChatGPT isn't going to exfiltrate anything except by somehow social-engineering you into volunteering the information.

Replies

selridge • yesterday at 5:22 PM

That's all true but it will fall before "[t]he agent has ambient access because it makes it more capable". Folks can shake their heads or worry or whatever, but feet are going to beat to where it is sweet. Users will follow capability.

It's why people are hooking Open Claw up to stuff and letting it rip--putting it into a sandbox in a VM in a jail is like getting a brand new smartphone and setting it on Airplane Mode first thing.

alt Hacker News

Replies