#### Background: Sigmoid and Tanh

##### Sigmoid

Takes a vector, re-formates it nicely into a [0, 1] range.

The "sigmoided" vector can be element-wise multiplied with another vector to apply a filter.

If you want to keep the first element and discard the second element of `[5, 5]`

, you can element-wise multiply it with `[1, 0]`

, right? Sigmoid is what lets you build that `[1, 0]`

filter.

##### Tanh

Takes a vector, re-formats it nicely into a [-1, 1] range.

#### Gates

Now on to the gates! Let's say the vector `N`

is a concatenation of the previous hidden state and current input (`[h[t-1], x[t]`

).

##### Input Gate

Puts `N`

through sigmoid, gets the resulting filter that decides what input and hidden state to commit to cell state (long term memory), and element-wise multiply it with tanh'd `N`

to apply the filter.

Cell state has the same dimension as `N`

, so it can be element-wise multiplied with sigmoided `N`

, or the filter (each element of 'N' is squashed to the [0, 1] range by sigmoid).

##### Forget Gate

Creates the same filter as the input gate (`N`

through sigmoid), and applies (element-wise multiplication) that filter to the previous cell state.

##### Output Gate

Creates the same filter as the input gate (`N`

through sigmoid), and applies (element-wise multiplication) that filter to `tanh(result of input gate + result of forget gate)`

.

#### Also

Each sigmoid and tanh is a neural network layer. To grasp the *essence* of LSTMs, just remove all tanh gates from the diagrams and study them again.