Artificial Intelligence & Machine Learning

Transformer

Definition

A Transformer is a deep learning model architecture introduced in the 2017 paper "Attention Is All You Need." It is based on the concept of self-attention, which allows it to weigh the importance of different words in the input text.

Why It Matters

The Transformer architecture revolutionized NLP and is the foundation for almost all modern Large Language Models, including GPT and Gemini. Its self-attention mechanism is particularly effective at handling long-range dependencies in language.

Contextual Example

In the sentence "The cat sat on the mat, it was happy," a Transformer's attention mechanism can figure out that "it" refers to the "cat," even though they are several words apart.

Common Misunderstandings

  • Prior to Transformers, models like RNNs and LSTMs were used for sequence tasks, but they struggled with long sequences.
  • The "T" in GPT stands for Transformer.

Related Terms

Last Updated: December 17, 2025