Overview
ReLU is the default activation function for most deep learning models. Its mathematical form is f(x) = max(0, x).
Advantages
- Computational Efficiency: Very simple to calculate.
- Reduces Vanishing Gradient: Helps training in deep networks compared to Sigmoid or Tanh.
- Sparsity: Can result in some neurons being exactly zero, which can be beneficial.
Variants
- Leaky ReLU: Allows a small, non-zero gradient when the input is negative to prevent 'dying neurons.'