Kevin Champlin
← Back to glossary

Cache creation

cache-creation
Also called cache creation tokens/ cache_creation_input_tokens

Cache creation tokens are input tokens that were written to the prompt cache for reuse on later turns. They cost slightly more than normal input on the way in (roughly 25 percent premium), but every subsequent turn that hits this cache reads those same tokens at about 10 percent of normal input price.

When the side rail shows cache_creation: 4.2K, that is 4,200 input tokens that were just written into the cache. You paid a small premium to write them, but the math pays off as soon as a second turn re-reads any of them: one cache write plus two cache reads is already cheaper than three fresh reads.

In a healthy chat session, cache_creation is large on the first turn (writing the system prompt, persona, retrieved corpus) and effectively zero on every turn after. If you see cache_creation staying high across multiple turns, that is a symptom of the cache silently missing and being rewritten. Common causes: a timestamp embedded in the system prompt, a randomly-ordered list of retrieved chunks, or a session ID interpolated into the prefix.

This site's chat is structured so the cache is written once at the start of a conversation and read on every following turn. /cost-of-mind reports the dollars saved by this pattern as a running total.

Want the rest?

There are 10 terms total.

See the full glossary
Today, UTC
Monthly
refreshed /cost-of-mind →