✏️ Gumbel Softmax

Tue, 22 Jul 2025 00:00:00 +0000

Motivation. In many models we need to select a discrete option inside the computation graph (e.g., pick one branch of a network). A hard argmax is non-differentiable, so gradients can’t flow through it. Gumbel-Softmax provides a continuous, differentiable approximation to this discrete sampling step.

Gumbel-Max Trick Link to heading

Assume we have discrete distribution

$X$	1	2	3
$p$	0.2	0.3	0.5

And want to get $X$ follow this distribution. If we directly sample from distribution, the $X$ can’t calculate from $p$. Which means $X$ can’t be differentiated w.r.t. $p$. This means we can’t do back propagation.

Math on ChengAo Shen

✏️ Gumbel Softmax

Gumbel-Max Trick Link to heading