Overview

Weight decay is a form of L2 Regularization. By penalizing large weights, it forces the model to keep the weights small, which leads to a simpler, smoother model that is less likely to overfit.

Why it Works

Large weights often indicate that the model is relying too heavily on specific features or noise in the training data. Small weights encourage the model to distribute its 'attention' more evenly.

Related Terms