logoalt Hacker News

prodigycorptoday at 1:16 PM2 repliesview on HN

Random aside about training data:

One of the funniest things I've started to notice from Gemini in particular is that in random situations, it talks with english with an agreeable affect that I can only describe as.. Indian? I've never noticed such a thing leak through before. There must be a ton of people in India who are generating new datasets for training.


Replies

evntdrvntoday at 3:07 PM

There was a really great article or blog post published in the last few months about the author's very personal experience whose gist was "People complain that I sound/write like an LLM, but it's actually the inverse because I grew up in X where people are taught formal English to sound educated/western, and those areas are now heavily used for LLM training."

I wish I could find it again, if someone else knows the link please post it!

show 3 replies
blenderobtoday at 1:51 PM

That's very interesting. Any examples you can share which has those agreeable effects?

show 1 reply