Context window

context-window

Also called context length/ max context/ context size

The context window is the maximum amount of text, in tokens, that a model can consider in a single request. Everything you send in plus everything the model writes out has to fit. Larger windows enable longer documents, full codebases, and multi-turn conversations without forgetting.

Photo: Zetong Li / Pexels

When a model has a 200,000-token context window, that is the total budget for one inference call. Your system prompt, conversation history, retrieved documents, and the model's own response all draw from the same pool. The cost of using more context is real: longer prompts cost more on the input side, and the model spends more compute reading them.

Context window size has grown by orders of magnitude in the last two years. Frontier models in 2024 commonly offered 8K to 32K. By 2026, 200K is normal and 1M is available on selected models. This is one of the headline numbers on the /state-of-the-art dashboard.

Bigger is not strictly better. Models can suffer from context rot at the edges of long inputs, where information in the middle gets weighted less than information at the start or end. This is a quiet limitation worth respecting: a 1M-token model is not magically better at recalling the 500,000th token than a 200K-token model is at recalling the 100,000th. Read the room when prompting at scale.

Related concepts

context-rot

awaiting authorship

Want the rest?

There are 10 terms total.

See the full glossary