Overview
Optimizers are the 'engines' of training. They use the gradients calculated during backpropagation to decide exactly how to update the model's weights.
Popular Optimizers
- SGD: The basic approach.
- Adam: The most popular choice today; it uses 'adaptive' learning rates for each parameter.
- RMSprop: Good for recurrent neural networks.
- Adagrad: Good for sparse data.
Goal
To find the global minimum of the loss function as quickly and reliably as possible.