Artificial Intelligence & Machine Learning

Attention Mechanism

Definition

An attention mechanism is a technique in neural networks that allows the model to focus on the most relevant parts of the input sequence when producing an output. It mimics cognitive attention in humans.

Why It Matters

Attention was a key innovation that made Transformer models so powerful. It helps the model to understand context and handle long dependencies in sequences, like text or speech.

Contextual Example

In a machine translation task, when the model is generating the translated word, the attention mechanism helps it to focus on the most relevant source words, regardless of their position in the sentence.

Common Misunderstandings

  • Self-attention is a specific type of attention mechanism where the model relates different positions of a single sequence in order to compute a representation of the sequence. It is the core of the Transformer model.
  • It produces "attention scores" that indicate how important each input element is for the current output.

Related Terms

Last Updated: December 17, 2025