Subject2Learn - AI Learning Platform

What it is

Inference is when a trained model generates outputs for new inputs—the actual use of the model after training is complete. When you send a prompt to ChatGPT and receive a response, that's inference. Inference requires computing resources but far less than training.

Why it matters

Inference is what you're paying for when using AI services. Token-based pricing reflects inference costs. Understanding inference helps explain why responses take time (the model is computing), why longer outputs cost more, and why running models locally requires capable hardware.