Artificial Intelligence & Machine Learning
Transformer
Definition
A Transformer is a deep learning model architecture introduced in the 2017 paper "Attention Is All You Need." It is based on the concept of self-attention, which allows it to weigh the importance of different words in the input text.
Why It Matters
The Transformer architecture revolutionized NLP and is the foundation for almost all modern Large Language Models, including GPT and Gemini. Its self-attention mechanism is particularly effective at handling long-range dependencies in language.
Contextual Example
In the sentence "The cat sat on the mat, it was happy," a Transformer's attention mechanism can figure out that "it" refers to the "cat," even though they are several words apart.
Common Misunderstandings
- Prior to Transformers, models like RNNs and LSTMs were used for sequence tasks, but they struggled with long sequences.
- The "T" in GPT stands for Transformer.