logoalt Hacker News

esperenttoday at 2:18 AM1 replyview on HN

Have you tried asking it to remove vowels?


Replies

andaitoday at 8:31 PM

Not sure that would help due to how tokenization works, but I remember from the early GPT-4 days that LLMs have the ability to "compress" a message into an incomprehensible string of Unicode, which the LLM itself understands perfectly, and which is 5-10x shorter than the English text.

That was a big deal when the context size was 8K; now that tokens are cheap and context is huge, nobody seems to be investigating that anymore.