logoalt Hacker News

hodgehog11yesterday at 11:45 PM5 repliesview on HN

As someone who works in the area, this provides a decent summary of the most popular research items. The most useful and impressive part is the set of open problems at the end, which just about covers all of the main research directions in the field.

The skepticism I'm seeing in the comments really highlights how little of this work is trickling down to the public, which is very sad to see. While it can offer few mathematical mechanisms to infer optimal network design yet (mostly because just trying stuff empirically is often faster than going through the theory, so it is more common to retroactively infer things), the question "why do neural networks work better than other models?" is getting pretty close to a solid answer. Problem is, that was never the question people seem to have ever really been interested in, so the field now has to figure out what questions we ask next.


Replies

r0ze-at-hntoday at 8:13 AM

We’re in a strange era where the Information-Theoretic foundations of deep learning are solidifying. The 'Why' is largely solved: it’s the efficient minimization of irreversible information loss relative to the noise floor. There is so much waste scaling models bigger and bigger when the math points to how to do it much more efficiently. One can take a great 70B model and have it run in only ~16GB with no loss in capability and the ability to keep training, but the last few years funding only went for "bigger".

As you noted, the industry has moved the goalposts to Agency and Long-horizon Persistence. The transition from building 'calculators that predict' to 'systems that endure' is a non-equilibrium thermodynamics problem. There is math/formulas and basic laws at play here that apply to AI just as much as it applies to other systems. Ironically it is the same math. The same thing that results in a signal persisting in a model will result in agents persisting.

This is my specific niche. I study how things persist. It’s honestly a bit painful watching the AI field struggle to re-learn first principles that other disciplines have already learned. I have a doc I use to help teach folks how the math works and how to apply it to their domain and it is fun giving it folks who then stop guessing and know exactly how to improve the persistence of what they are working on. Like the idea of "How many hours we can have a model work" is so cute compared to the right questions.

show 1 reply
chadcmulligantoday at 1:58 AM

"why do neural networks work better than other models?" That sounds really interesting - any references (for a non specialist)?

show 1 reply
niksmathertoday at 6:22 AM

Do neural networks work better than other models? They can definitely model a wider class of problems than traditional ML models (images being the canonical example). However, I thought where a like for like comparison was possible they tend to worse than gradient boosting.

show 1 reply
cookiengineertoday at 4:10 AM

In my opinion current research should focus on revisiting older concepts to figure out if they can be applied to transformers.

Transformers are superior "database" encodings as the hype about LLMs points out, but there have been promising ML models that were focusing on memory parts for their niche use cases, which could be promising concepts if we could make them work with attention matrixes and/or use the frequency projection idea on their neuron weights.

The way RNNs evolved to LSTMs, GRUs, and eventually DNCs was pretty interesting to me. In my own implementations and use cases I wasn't able to reproduce Deepmind's claims in the DNC memory related parts. Back at the time the "seeking heads" idea of attention matrixes wasn't there yet, maybe there's a way to build better read/write/access/etc gates now.

[1] a fairly good implementation I found: https://github.com/joergfranke/ADNC

mathisfun123today at 3:26 AM

> why do neural networks work better than other models

The only people for whom this is an open question are the academics - everyone else understands it's entirely because of the bagillions of parameters.

show 3 replies