logoalt Hacker News

blackbear_today at 6:55 AM0 repliesview on HN

The GPT3 paper is a good starting point

Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165

I also enjoyed the papers for DeepSeek and GLM for an overview of all the tricks you need to make these things work

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models https://arxiv.org/abs/2512.02556

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models https://arxiv.org/abs/2508.06471