Here's where I'm missing understanding: for decades the idea of neural networks had existed with minimal attention. Then in 2017 Attention Is All You Need gets released and since then there is an exponential explosion in deep learning. I understand that deep learning is accelerated by GPUs but the concept of a transformer could have been used on much slower hardware much earlier.
This is encouraging. The title is a bit much. "Potential points of attack for understanding what deep learning is really doing" would be more accurate but less attention-grabbing.
It might lead to understanding how to measure when a deep learning system is making stuff up or hallucinating. That would have a huge payoff. Until we get that, deep learning systems are limited to tasks where the consequences of outputting bullshit are low.
Honestly, I found these two attempts at universal theory more interesting:
https://arxiv.org/abs/2510.12269
https://www.mdpi.com/1099-4300/28/3/332
I am also interested in connection with fuzzy logic - it seems that NNs can reason in a fuzzy way, but what they are doing, formally? For years, people have been trying to formalize fuzzy reasoning but it looks like we don't care anymore.
I feel like NNs (and transformers) are the OOP (object-oriented programming) of ML. Really popular, works pretty well in practice, but nobody understands the fundamentals; there is a feeling it is a made up new language to express things expressible before, but hard to pinpoint where exactly it helps.
Hopefully the day some vendors market AI as divine entity will soon over.
I'm only partially through this paper, but it's written in a very engaging and thoughtful manner.
There is so much to digest here but it's fascinating seeing it all put together!
Theory becomes critical when you need to predict failure modes. A decision support system that 'just works' most of the time but fails silently on edge cases is worse than a simpler system with known limitations. Understanding the bias mechanisms would help us know when a model is confident vs when it's just pattern matching. That distinction matters when the stakes are high.
Deep learning works at a very high level because 'it can keep learning from more data' better than any other approaches. But without the 'stupid amount of data' that is available now, the architecture would be kind of irrelevant. Unless you are going some way to explain both sides of the model-data equation I don't feel you have a solid basis to build a scientific theory, e.g. 'why reasoning models can reason'. The model is the product of both the architecture and training data.
My fear is that this is as hopeless right now as explaining why humans or other animals can learn certain things from their huge amount of input data. We'll gain better empirical understanding, but it won't ever be fundamental computer science again, because the giga-datasets are the fundamental complexity not the architecture.
> We argue complexity conceals underlying regularity, and that deep learning will indeed admit a scientific theory
That would be amazing, but personally I’m skeptical.
Wait a min. Does this paper says we don't know how back-propogation works?
wow.. this would be cool. Instead of just.. guessing "shapes"
I have a "theory" that will be wrong, but for a reasonable consideration I can "theorise" in the the other direction.
I think we need the equivalent of general relativity for latent spaces.
Is there not some Rice's Theorem equivalent for deep nets? After all they are machines that are randomly generated, so from classical computer science I would not presume a theory of "what do all deep nets do" to be prima facie logically possible. Nor do I see this explained in the objections section.
Well, "There Will Be a Scientific Theory of Deep Learning" looks like flag planting - an academic variant of "I told you so!", but one that is a citation magnet.
"A New Kind of Science" ...
[dead]
[dead]
[dead]
I’m in the skeptical camp. Whatever theory that will eventually emerge will not be as solid as: 1. Theory of pattern recognition (as developed in 80s and 90s) 2. Theory of thermodynamics 3. Theory of gravity 4. Theory of electromagnetism 5. Theory of relativity Etc. because of two reasons: 1. While half of deep learning is how humans construct the architecture of networks, the more important half relies on data. This data is a hodgepodge of scraped internet data (text and videos), books, user interactions etc., which really has no coherent structure 2. To extract meaningful insights from this much data, it takes models of enormous size like 10B+. The thing about random systems (in the mathematical sense) is that it takes “something” of order of magnitude bigger size to “understand” it, unless there is some concentration of measure type mathematical niceties (as in thermodynamics), which I don’t think is there in these models and data. This is the same reason I don’t think humans will ever be able to “understand” human consciousness. It will take something of an order of magnitude bigger than our own brains to do that. Here is Terence Tao explaining this concentration stuff in another context: https://mathstodon.xyz/@tao/113873092369347147 I would love to be proven wrong though.
As someone who works in the area, this provides a decent summary of the most popular research items. The most useful and impressive part is the set of open problems at the end, which just about covers all of the main research directions in the field.
The skepticism I'm seeing in the comments really highlights how little of this work is trickling down to the public, which is very sad to see. While it can offer few mathematical mechanisms to infer optimal network design yet (mostly because just trying stuff empirically is often faster than going through the theory, so it is more common to retroactively infer things), the question "why do neural networks work better than other models?" is getting pretty close to a solid answer. Problem is, that was never the question people seem to have ever really been interested in, so the field now has to figure out what questions we ask next.