Alignment

Module: fundamentals

What it is

Alignment refers to making AI systems behave in accordance with human intentions and values. An aligned model does what users actually want, avoids harmful outputs, and behaves helpfully even in edge cases. Alignment is achieved through training techniques like RLHF, constitutional AI, and careful dataset curation.

Why it matters

Alignment is why AI assistants refuse harmful requests and try to be genuinely helpful. Misaligned AI might technically complete tasks while violating user intent or causing harm. As AI becomes more capable, alignment becomes more important—powerful unaligned systems would be dangerous.