Isn't that over-simplifying it a bit too much? You can go another step - a FFN can be simulat...

dist-epoch • today at 11:14 AM • 2 replies • view on HN

Isn't that over-simplifying it a bit too much?

You can go another step - a FFN can be simulated on a Turing machine, thus it just exemplifies the incredible semantical power of the Turing machine model of computation. (in fact you don't even need a Turing machine, since there is no looping in one forward pass).

In theory you can run a huge FFN on the tiniest Turing machine, in practice it's much better to run a Transformer on the latest NVIDIA hardware. Or as they say "quantity (performance) has a quality all its own"

Replies

musebox35 • today at 12:21 PM

I was about to post your last point / quote. Going multigpu is relatively not so though but once you go multi-node you have distributed storage/io/compute system which is highly non trivial. Add that the long training times now you have robustness/fault-tolerantness concerns with hardware failures and restarts. Today’s training systems are engineering marvels.

zbendefy • today at 11:38 AM

Good point!

There is also the case for Markov chains being theoretically able to do these if tuned well. Or even SAT problem.

➕ show 1 reply

alt Hacker News

Replies