Training Data Bias

Module: ethics

What it is

Training data bias occurs when the data used to train an AI doesn't represent the full population fairly. If training data overrepresents certain groups or contains historical prejudices, the model learns those patterns. An AI trained mostly on English text will perform worse on other languages.

Why it matters

AI learns from its training data—including the biases embedded in it. Models trained on internet text absorb the stereotypes and prejudices present there. Recognising training data as a source of bias helps explain why AI might produce unfair or stereotyped outputs and why diverse, curated training data matters.