Overview

ReLU is the default activation function for most deep learning models. Its mathematical form is f(x) = max(0, x).

Advantages

  • Computational Efficiency: Very simple to calculate.
  • Reduces Vanishing Gradient: Helps training in deep networks compared to Sigmoid or Tanh.
  • Sparsity: Can result in some neurons being exactly zero, which can be beneficial.

Variants

  • Leaky ReLU: Allows a small, non-zero gradient when the input is negative to prevent 'dying neurons.'

Related Terms