I asked Claude for 37,500 random names, and it can't stop saying Marcus

52 points • by benjismith • today at 4:23 PM • 45 comments • view on HN

Comments

deepsquirrelnet • today at 7:47 PM

Ask an llm to pick a random number from 1-10. My money is on 7.

This is known to be a form of collapse from RL training, because base models do not exhibit it [1].

1. https://arxiv.org/abs/2505.00047

➕ show 3 replies

paxys • today at 7:33 PM

The part about injecting randomness is the most intersting bit of the article.

So if you want your LLM responses to be more distributed (beyond what setting the temperature will allow), add some random english words to the start of the prompt.

➕ show 4 replies

zone411 • today at 8:30 PM

I've made top-10 lists of LLMs' favorite names to use in creative writing here: https://x.com/LechMazur/status/2020206185190945178. They often recur across different LLMs. For example, they love Elara and Elias.

anotheryou • today at 7:41 PM

Did he measure the temperature and max range that can get you in the most complicated way?

interesting:

- Marcus is not in this top list: https://www.ssa.gov/oact/babynames/decades/century.html

- Marcus is its own token for TikToken (but many from that list are)

isoprophlex • today at 8:12 PM

This is of course entirely expected. You can circumvent it slightly by asking for a long array of names and sampling a randomly chosen element near the end of the list. Say ask for 50 names and use the 41-50th element stochastically.

Not perfect, more expensive, but it helps a little. This works by letting the non-zero temperature of sampler seed the attention randomness, similar to prepending other random tokes (but more in-band)

Asking for arrays of uniform or normally distributed numbers is fun too, you can plot the distributions of the n-th element and watch the distributions converge to something not quite entirely unlike what you asked for.

Often there's some bias between element indices too, eg. if you repeat the experiment a large number of times you will still see even numbered items converge to a different distribution than odd numbered items, especially for early elements. Hence the stochastic averaging trick over the last few elements.

samwho • today at 8:29 PM

I wrote a tool called llmwalk (https://github.com/samwho/llmwalk) that’ll deterministically show you the likelihood the top N answers are for a given open model and prompt. No help on frontier models, but maybe helpful if you want to run a similar analysis more quickly on open models!

figassis • today at 7:53 PM

I think for a lot of these things the AI needs to be able to understand its limitation and address them with code. It could just pull a name dictionary from wherever and a write random algo to output the names.

sjkoelle • today at 7:33 PM

amara must be this dataset https://en.wikipedia.org/wiki/Amara_(organization)

saaaaaam • today at 8:15 PM

I’ve had this issue (via chat, rather than the API) - but it kept saying Dorian. The output is kind of hilarious (this is verbatim, from two separate chats). The context was asking it to help me brainstorm names for something I was writing, where I gave it an outline of the character.

Dorian - Too on the nose, given Wilde and the period. Probably avoid. Vivian - Gender-ambiguous, aesthetic associations. Wilde used it (in "The Decay of Lying"). Suggests a man comfortable with ambiguity. Hugo - Continental dash. A man who's been to Paris and wants you to know it. Dorian - Too obvious. Rupert - Regency rake energy. The kind of man who'd own theatres and keep a mistress openly. Dorian - Already said no. Dorian - I keep typing it. Definitely no.

Alexander Dorian... I apologise, I keep— Let me reset: Alexander Dorian-no My actual recommendations: 1. Alexander Dorian— I apologise. I'm having a technical issue with one particular word. Let me try once more:

_joel • today at 8:10 PM

PartridgeGPT would just give, "Dan, Dan..... Daaaan!"

➕ show 1 reply

clucas • today at 7:50 PM

Maybe it's just a big Stella fan: https://www.youtube.com/watch?v=ZxVIGXlSW-k

goodmythical • today at 7:59 PM

"this just in, tool behaves predictably outside of imagined specification"

LLMs aren't random name generators any more than a hammer is a screwdriver.

Ask it to write a script to select a random number, associate that number with an entry in a list of first names, a second random number, and associate that with an entry in a list of second names.

Presto bang-o, you've got a bespoke random name generator.

Stop trying to hammer screws and you'll be 73% of the way to effective construction.

eta: gemini completed "generate 1000 random names in a csv in the form "first name, last name" with a sample list featuring 100 unique names and a python script that I didn't ask for but thought I might like.

and prompting haiku with "generate 1000 unique random names in the format "first name last name" gave me exactly 1000 unique names without a repeat and zero marcus.

➕ show 1 reply

josefritzishere • today at 7:53 PM

LLMs don't really do random.

➕ show 2 replies

quercusa • today at 7:47 PM

Envisioning an update to https://xkcd.com/221/

nottorp • today at 8:23 PM

It lost context at name #8300 :)

_dwt • today at 7:23 PM

Gary Marcus is living in Claude's head rent-free?

➕ show 1 reply

lokimedes • today at 7:54 PM

Marcus is pretty random.

EuanReid • today at 7:27 PM

I suppose it appears a bunch in training data. Marcus Aurelius and Marcus Crassus get mentioned a lot through history.

➕ show 1 reply

wyldfire • today at 7:36 PM

"I expected an automaton to be a good source of entropy and it turns out it is not."

BTW LLM here is doing a great job of emulating humans. They are not good at this task either.

> Nine parameter combinations produced zero entropy — perfectly deterministic output

They'd need some kind of special training to go request entropy from a system entropy device. Behaving deterministically is a feature, not a bug.

➕ show 1 reply

agluszak • today at 7:32 PM

Marcus the Worm[1] infected Claude

[1] - https://www.youtube.com/shorts/9p0CwDNM9Ps

alt Hacker News

I asked Claude for 37,500 random names, and it can't stop saying Marcus

Comments