Self-Supervised Learning
Module: fundamentals
What it is
Self-supervised learning is a training approach where the model creates its own training signal from unlabelled data. For LLMs, this means predicting hidden or next tokens in existing text—no human labelling required. The structure of the data itself provides the supervision signal.
Why it matters
Self-supervised learning is why LLMs can train on internet-scale data. Manually labelling billions of examples would be impossible, but self-supervised approaches let models learn from raw text. This is a key reason modern AI has progressed so rapidly—training data can be gathered rather than painstakingly created.