Overview
Batch Normalization addresses the problem of 'internal covariate shift,' where the distribution of inputs to a layer changes as the weights of previous layers are updated.
Benefits
- Faster Training: Allows for higher learning rates.
- Stability: Makes the network less sensitive to the initial weight values.
- Regularization: Has a slight regularizing effect, often reducing the need for dropout.
How it Works
It calculates the mean and variance of the batch and scales the inputs accordingly, then applies two learnable parameters (gamma and beta) to allow the network to undo the normalization if needed.