Basically, the only way you're separting user input from model meta-input is using some kind of...

cyanydeez • yesterday at 8:22 PM • 2 replies • view on HN

Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.

Replies

Lerc • yesterday at 11:18 PM

Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.

Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.

zahlman • yesterday at 11:24 PM

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

➕ show 1 reply

alt Hacker News

Replies