Token

Module: fundamentals

What it is

A token is the basic unit that language models process. Rather than reading individual characters or whole words, LLMs break text into tokens—typically common words, word fragments, or punctuation. "Understanding" might be one token, while "un", "der", and "standing" could be three. The exact tokenisation varies by model.

Why it matters

Tokens determine cost and context limits. When a service charges per token or has a token limit, that's how much text you can input and output. Roughly, 1 token equals about 4 characters or 0.75 words in English. A 4,000-token limit means roughly 3,000 words.