Artificial Intelligence & Machine Learning
Self-Supervised Learning
Definition
Self-supervised learning is a type of machine learning where the model learns from the data itself without explicit human-provided labels. It does this by creating its own labels from the input data, by solving a "pretext task."
Why It Matters
Self-supervised learning has been a major breakthrough, especially in NLP. It allows models to learn rich representations of data from vast amounts of unlabeled text or images, which can then be fine-tuned for specific tasks.
Contextual Example
A language model like BERT is trained using self-supervised learning. It takes a sentence, masks out some of the words, and its task is to predict the masked words. The labels (the correct words) are part of the input data itself.
Common Misunderstandings
- It is technically a form of unsupervised learning, but it uses a supervised-like training process.
- The pre-training of most modern Large Language Models is a form of self-supervised learning.