Training Data Bias
Module: ethics
What it is
Training data bias occurs when the data used to train an AI doesn't represent the full population fairly. If training data overrepresents certain groups or contains historical prejudices, the model learns those patterns. An AI trained mostly on English text will perform worse on other languages.
Why it matters
AI learns from its training data—including the biases embedded in it. Models trained on internet text absorb the stereotypes and prejudices present there. Recognising training data as a source of bias helps explain why AI might produce unfair or stereotyped outputs and why diverse, curated training data matters.