Batch Normalization

Overview

Batch Normalization addresses the problem of 'internal covariate shift,' where the distribution of inputs to a layer changes as the weights of previous layers are updated.

Benefits

Faster Training: Allows for higher learning rates.
Stability: Makes the network less sensitive to the initial weight values.
Regularization: Has a slight regularizing effect, often reducing the need for dropout.

How it Works

It calculates the mean and variance of the batch and scales the inputs accordingly, then applies two learnable parameters (gamma and beta) to allow the network to undo the normalization if needed.

Overview

Benefits

How it Works

Related Terms