logoalt Hacker News

cyberge99yesterday at 2:58 PM2 repliesview on HN

Forgive me if this is a naive assumption, but wouldn’t large language models be fundamentally different for a language that is largely symbols? Again, my understanding of Mandarin is limited if it exists at all.


Replies

dophyesterday at 3:06 PM

All tokens are symbols. All of the frontier models speak Mandarin.

show 1 reply
wat10000yesterday at 7:35 PM

"飞机" and "airplane" aren't fundamentally different in terms of how they're represented to a computer. Especially for an LLM, where tokenization likely turns each of those into a single token.