Overview

Self-attention (or intra-attention) allows each word in a sentence to 'attend' to every other word, helping the model understand the context and relationships within the sequence.

Example

In the sentence 'The animal didn't cross the street because it was too tired,' self-attention helps the model realize that 'it' refers to the 'animal,' not the 'street.'

Components

It uses three vectors for each input: Query, Key, and Value. The attention score is calculated by comparing the Query of one word with the Keys of all other words.

Related Terms