Overview

Attention revolutionized AI by allowing models to handle long sequences more effectively. Instead of compressing an entire input into a single fixed-length vector, the model can 'look back' at relevant parts of the input as needed.

How it Works

The model calculates 'attention weights' for each part of the input, indicating its importance for the current task. These weights are then used to create a weighted sum of the input features.

Evolution

Originally developed for machine translation with RNNs, it became the core component of the Transformer architecture in the form of 'Self-Attention.'

Related Terms