logoalt Hacker News

anonymoushntoday at 5:28 AM0 repliesview on HN

their old tokenizer performed some space collapsing that allowed them to use the same token id for a word with and without the leading space (in cases where the context usually implies a space and one is not present, a "no space" symbol is used).