logoalt Hacker News

ComplexSystemstoday at 11:00 AM2 repliesview on HN

Good article, but

"We take the exponential of each input and normalize by the sum of all exponentials. This transforms a vector of arbitrary real numbers into values between 0 and 1 that sum to 1, it technically this is a pseudo-probability distribution (they're not derived from a probability space), but it's close enough to a probability distribution and for practical purposes they work just fine."

Why is this a "pseudo-probability distribution?"


Replies

WCSTombstoday at 12:04 PM

Mathematically, it is literally a probability distribution, because it fits the definition of a measure whose total mass is one, so I think the language is just imprecise. What they may be trying to say is that semantically it doesn't arise in a principled way from an uncertainty model, such as from Bayesian or frequentist statistics.

show 1 reply
rhdunntoday at 11:59 AM

The comment in parenthesis mentions "they're not derived from a probability space" [1]. I don't know about probability spaces nor softmax to know what part of a probability space this is missing compared to other probability distributions, nor how other probability distributions satisfy probability spaces.

[1] https://en.wikipedia.org/wiki/Probability_space

show 1 reply