dictionary

Tokens & Tokenization

Before an LLM can read or write text, the text must be broken down into tokens. In English, a token is generally about 4 characters or 0.75 words. Common words might be a single token, while complex words are broken into syllables. Models charge by the token and have strict token limits in their context windows.

CategoryModels

Reading time2 min read

Last updatedFeb 19, 2025

Definition

The base units of data processed by an LLM. A token is typically a chunk of characters rather than a full word.

Need this applied?

We help teams go from definitions to deployed workflows—safely and fast.

Start a project Book a strategy call

FAQ

Email this summary + checklist

Get a copy of “Tokens & Tokenization” and an AI readiness checklist in your inbox.

dictionary

Tokens & Tokenization

CategoryModels

Reading time2 min read

Last updatedFeb 19, 2025

Definition

The base units of data processed by an LLM. A token is typically a chunk of characters rather than a full word.

Need this applied?

We help teams go from definitions to deployed workflows—safely and fast.

Start a project Book a strategy call

FAQ

Email this summary + checklist

Get a copy of “Tokens & Tokenization” and an AI readiness checklist in your inbox.