logoalt Hacker News

singpolyma3today at 2:56 AM6 repliesview on HN

Next do "why LLMs work"


Replies

inkysigmatoday at 9:22 AM

This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt.

https://arxiv.org/abs/2604.21691

There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.

sheeshkebabtoday at 3:21 AM

considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...

krackerstoday at 4:58 AM

See Tegmark's "why does deep cheap learning work so well" (well not so cheap anymore...)

https://www.youtube.com/watch?v=5MdSE-N0bxs is remarkably prescient given that it was written before LLMs

soupspacestoday at 3:33 AM

Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.

qseratoday at 10:56 AM

Because there are patterns everywhere!

skydhashtoday at 3:59 AM

Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.