Distillation
Module: tool mastery
What it is
Distillation is training a smaller model to mimic a larger one. The small "student" model learns from the outputs of the large "teacher" model, capturing much of its capability in a more compact form. This produces smaller, faster, cheaper models that perform surprisingly well.
Why it matters
Distillation is why small models keep getting better. Many impressive small models learned from larger ones. It's a practical technique—if you need a capable model that runs efficiently, a distilled model might offer the best balance. It also raises questions about intellectual property in model training.