This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-leve...

boothby • yesterday at 5:03 PM • 3 replies • view on HN

This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.

Replies

omneity • yesterday at 5:43 PM

Funny, I’ve been cracking[0] at this exact problem with a purpose-built model[1]:

0: https://huggingface.co/posts/omarkamali/593639295164067

1: https://omneitylabs.com/models/sawtone

jddj • yesterday at 8:01 PM

Claude the other day wrote code where one of the bytes in the array was 0xO5.

That's zero ex oh (the letter) five

mejutoco • yesterday at 5:29 PM

> righting.

> LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.

Is this an elaborate joke or your full-word misspelling of writing is both agreeing with your statement (word substitutions) and contradicting it (not semantic but only pronunciation similarity)

➕ show 2 replies

alt Hacker News

Replies