dictionary
Tokens & Tokenization
Before an LLM can read or write text, the text must be broken down into tokens. In English, a token is generally about 4 characters or 0.75 words. Common words might be a single token, while complex words are broken into syllables. Models charge by the token and have strict token limits in their context windows.
CategoryModels
Reading time2 min read
Last updatedFeb 19, 2025
Definition
The base units of data processed by an LLM. A token is typically a chunk of characters rather than a full word.
Need this applied?
We help teams go from definitions to deployed workflows—safely and fast.
FAQ
Email this summary + checklist
Get a copy of “Tokens & Tokenization” and an AI readiness checklist in your inbox.