logoalt Hacker News

cyanydeezyesterday at 8:22 PM2 repliesview on HN

Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.


Replies

Lercyesterday at 11:18 PM

Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.

Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.

zahlmanyesterday at 11:24 PM

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

show 1 reply