Overview

Instead of updating the model after every single example (which is slow) or after the entire dataset (which requires too much memory), we use 'batches.'

Tradeoffs

  • Small Batch Size: More frequent updates, more 'noise' in the learning process (which can help escape local minima), but slower overall training.
  • Large Batch Size: Faster training (better use of GPU parallelism), more stable updates, but requires more memory and can sometimes lead to poorer generalization.

Common Sizes

Typically powers of two, such as 16, 32, 64, or 128.

Related Terms