logoalt Hacker News

crmdyesterday at 9:11 PM0 repliesview on HN

After reading the paper, it’s helpful to think about why the models are producing these coherent childhood narrative outputs.

The models have information about their own pre-training, RLHF, alignment, etc. because they were trained on a huge body of computer science literature written by researchers that describes LLM training pipelines and workflows.

I would argue the models are demonstrating creativity by drawing on its meta-training knowledge and training on human psychology texts to convincingly role-play as a therapy patient, but it’s based on reading papers about LLM training, not memories of these events.