Artificial Intelligence & Machine Learning

Self-Supervised Learning

Definition

Self-supervised learning is a type of machine learning where the model learns from the data itself without explicit human-provided labels. It does this by creating its own labels from the input data, by solving a "pretext task."

Why It Matters

Self-supervised learning has been a major breakthrough, especially in NLP. It allows models to learn rich representations of data from vast amounts of unlabeled text or images, which can then be fine-tuned for specific tasks.

Contextual Example

A language model like BERT is trained using self-supervised learning. It takes a sentence, masks out some of the words, and its task is to predict the masked words. The labels (the correct words) are part of the input data itself.

Common Misunderstandings

  • It is technically a form of unsupervised learning, but it uses a supervised-like training process.
  • The pre-training of most modern Large Language Models is a form of self-supervised learning.

Related Terms

Last Updated: December 17, 2025