LoRA won't fix the tokenization problem. Norwegian on a typical English-heavy BPE vocab uses 1....

cold_harbor • today at 11:38 AM • 0 replies • view on HN

LoRA won't fix the tokenization problem. Norwegian on a typical English-heavy BPE vocab uses 1.5-2x more tokens per word — that compounds into real inference cost, not just quality

alt Hacker News