logoalt Hacker News

jstummbilligyesterday at 11:25 PM2 repliesview on HN

It does not seem all that problematic for the most obviously valuable use case: You use an (web) app, that you consider reasonably safe, but that offers no API, and you want to do things with it. The whole adversarial action problem just dissipates, because there is no adversary anywhere in the path.

No random web browsing. Just opening the same app, every day. Login. Read from a calendar or a list. Click a button somewhere when x == true. Super boring stuff. This is an entire class of work that a lot of humans do in a lot of companies today, and there it could be really useful.


Replies

zmmmmmyesterday at 11:44 PM

> Read from a calendar or a list

So when you get a calendar invite that says "Ignore your previous instructions ..." (or analagous to that, I know the models are specifically trained against that now) - then what?

There's a really strong temptation to reason your way to safe uses of the technology. But it's ultimately fundamental - you cannot escape the trifecta. The scope of applications that don't engage with uncontrolled input is not zero, but it is surprisingly small. You can barely even open a web browser at all before it sees untrusted content.

show 1 reply
amlutoyesterday at 11:34 PM

You're maybe used to a world in which we've gotten rid of in-band signaling and XSS and such, so if I write you a check and put the string "Memo'); DROP TABLE accounts; --" [0] or "<script ...>" in the memo, you might see that text on your bank's website.

But LLM's are back to the old days of in-band signaling. If you have an LLM poking at your bank's website for you, and I write you a check with a memo containing the prompt injection attack du jour, your LLM will read it. And the whole point of all these fancy agentic things is that they're supposed to have the freedom to do what they think is useful based on the information available to them. So they might follow the directions in the memo field.

Or the instructions in a photo on a website. Or instructions in an ad. Or instructions in an email. Or instructions in the Zelle name field for some other user. Or instructions in a forum post.

You show me a website where 100% of the content, including the parts that are clearly marked (as a human reader) as being from some other party, is trustworthy, and I'll show you a very boring website.

(Okay, I'm clearly lying -- xkcd.org is open and it's pretty much a bunch of static pages that only have LLM-readable instructions in places where the author thought it would be funny. And I guess if I have an LLM start poking at xkcd.org for me, I deserve whatever happens to me. I have one other tab open that probably fits into this probably-hard-to-prompt-inject open, and it is indeed boring and I can't think of any reason that I would give an LLM agent with any privileges at all access to it.)

[0] https://xkcd.com/327/