Overview
LSTMs were created to solve the vanishing gradient problem in standard RNNs. They can remember information for long periods and forget irrelevant data.
The LSTM Cell
An LSTM cell contains three main gates:
- Forget Gate: Decides what information to discard from the cell state.
- Input Gate: Decides which new information to store.
- Output Gate: Decides what part of the cell state to output.
Impact
LSTMs were the state-of-the-art for sequence modeling until the arrival of the Transformer architecture.