LSTM Gates in 10 Lines
LSTM Gates explained in 10 lines or less!
Background: Sigmoid and Tanh
Sigmoid
Takes a vector, re-formates it nicely into a [0, 1] range.
The "sigmoided" vector can be element-wise multiplied with another vector to apply a filter.
If you want to keep the first element and discard the second element of [5, 5]
, you can element-wise multiply it with [1, 0]
, right? Sigmoid is what lets you build that [1, 0]
filter.
Tanh
Takes a vector, re-formats it nicely into a [-1, 1] range.
Gates
Now on to the gates! Let's say the vector N
is a concatenation of the previous hidden state and current input ([h[t-1], x[t]
).
Input Gate
Puts N
through sigmoid, gets the resulting filter that decides what input and hidden state to commit to cell state (long term memory), and element-wise multiply it with tanh'd N
to apply the filter.
Cell state has the same dimension as N
, so it can be element-wise multiplied with sigmoided N
, or the filter (each element of 'N' is squashed to the [0, 1] range by sigmoid).
Forget Gate
Creates the same filter as the input gate (N
through sigmoid), and applies (element-wise multiplication) that filter to the previous cell state.
Output Gate
Creates the same filter as the input gate (N
through sigmoid), and applies (element-wise multiplication) that filter to tanh(result of input gate + result of forget gate)
.
Also
Each sigmoid and tanh is a neural network layer. To grasp the essence of LSTMs, just remove all tanh gates from the diagrams and study them again.