logoalt Hacker News

jmalickitoday at 1:54 PM1 replyview on HN

The soft max is the probability of the next token being whatever in the training data conditioned on the inputs. The author just doesn't know that apparently and thinks it was an arbitrary choice.

The author's essay on the sigmoid similarly lacks the deep understanding that it comes from somewhere and isn't an arbitrary choice.


Replies

canjobeartoday at 3:39 PM

The softmax, after the network has been trained, yields an estimate of the probability in the training data, but it is not that probability itself.

show 1 reply