This article describes how Transformers work, but not really how LLMs work. Explaining the underlying architecture gives you about as much insight into how a modern LLM behaves as an breakdown of neuronal biochemistry and a few pathways does for the brain. Meaning, almost no insight at all.