Overview

In some models (especially RNNs), gradients can become extremely large, causing the model's weights to update so drastically that the training becomes unstable. Gradient clipping 'clips' the gradients to a predefined maximum value if they exceed a certain threshold.

Types

  • Clip by Value: Capping individual gradient components.
  • Clip by Norm: Scaling the entire gradient vector so its total length (norm) doesn't exceed a limit.