Next do "why LLMs work" | alt Hacker News

singpolyma3 • today at 2:56 AM • 6 replies • view on HN

Next do "why LLMs work"

Replies

This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt.

https://arxiv.org/abs/2604.21691

There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.

sheeshkebab • today at 3:21 AM

considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...

krackers • today at 4:58 AM

See Tegmark's "why does deep cheap learning work so well" (well not so cheap anymore...)

https://www.youtube.com/watch?v=5MdSE-N0bxs is remarkably prescient given that it was written before LLMs

soupspaces • today at 3:33 AM

Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.

qsera • today at 10:56 AM

Because there are patterns everywhere!

skydhash • today at 3:59 AM

Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.