Tokens

tokens

Also called token/ input tokens/ output tokens

Tokens are the units a language model reads and writes. Roughly four characters of English per token, give or take. Every cost and every limit on this site is denominated in tokens, not words or characters, because that is what the model actually counts.

Photo: Suki Lee / Pexels

A token is the atomic unit of text inside a model. Tokenization is the process that turns your prompt into a sequence of integers the model can do math on. Common English words are usually one token. Rare words and unusual punctuation can be two, three, or more. A single emoji can be several tokens. Code is generally less efficient than prose: a Python keyword is often one token but a long variable name might be five.

This matters because pricing is per million tokens, context windows are measured in tokens, and rate limits are denominated in tokens per minute. When the side rail on a chat turn shows input_tokens: 412, that is what you are paying for on the way in, and the model literally read 412 of these subword units. The number of words in your message is a rough proxy at best.

There is no universal token vocabulary. Anthropic, OpenAI, and Google all tokenize differently, which means a prompt that costs 1,000 tokens against one model might cost 1,200 against another even though the text is identical. When this site reports cost-per-million-tokens on /state-of-the-art, that comparison is useful but never apples-to-apples.

Related concepts

Context window

context-window

tokenization

awaiting authorship

Prompt caching

prompt-caching

Want the rest?

There are 10 terms total.

See the full glossary