Inference
Module: fundamentals
What it is
Inference is when a trained model generates outputs for new inputs—the actual use of the model after training is complete. When you send a prompt to ChatGPT and receive a response, that's inference. Inference requires computing resources but far less than training.
Why it matters
Inference is what you're paying for when using AI services. Token-based pricing reflects inference costs. Understanding inference helps explain why responses take time (the model is computing), why longer outputs cost more, and why running models locally requires capable hardware.