Background: Sigmoid and Tanh

Sigmoid

Takes a vector, re-formates it nicely into a [0, 1] range.

The "sigmoided" vector can be element-wise multiplied with another vector to apply a filter.

If you want to keep the first element and discard the second element of [5, 5], you can element-wise multiply it with [1, 0], right? Sigmoid is what lets you build that [1, 0] filter.

Tanh

Takes a vector, re-formats it nicely into a [-1, 1] range.


Gates

Now on to the gates! Let's say the vector N is a concatenation of the previous hidden state and current input ([h[t-1], x[t]).

Input Gate

Puts N through sigmoid, gets the resulting filter that decides what input and hidden state to commit to cell state (long term memory), and element-wise multiply it with tanh'd N to apply the filter.

Cell state has the same dimension as N, so it can be element-wise multiplied with sigmoided N, or the filter (each element of 'N' is squashed to the [0, 1] range by sigmoid).

Forget Gate

Creates the same filter as the input gate (N through sigmoid), and applies (element-wise multiplication) that filter to the previous cell state.

Output Gate

Creates the same filter as the input gate (N through sigmoid), and applies (element-wise multiplication) that filter to tanh(result of input gate + result of forget gate).


Also

Each sigmoid and tanh is a neural network layer. To grasp the essence of LSTMs, just remove all tanh gates from the diagrams and study them again.

Tagged in:

nlp, lstm

Last Update: February 24, 2024