Overview
In some models (especially RNNs), gradients can become extremely large, causing the model's weights to update so drastically that the training becomes unstable. Gradient clipping 'clips' the gradients to a predefined maximum value if they exceed a certain threshold.
Types
- Clip by Value: Capping individual gradient components.
- Clip by Norm: Scaling the entire gradient vector so its total length (norm) doesn't exceed a limit.