Pre-training

Module: fundamentals

What it is

Pre-training is the initial, large-scale training phase where a model learns general language patterns from massive text datasets. During pre-training, the model develops broad capabilities like grammar, facts, reasoning patterns, and writing styles. This phase is computationally expensive and typically done once by AI companies.

Why it matters

Pre-training creates the foundation that makes LLMs useful. It's why you can ask a model about almost any topic—the broad pre-training exposed it to diverse knowledge. The quality and breadth of pre-training data significantly impacts what the model knows and how well it reasons.