Temperature

temperature

Also called sampling temperature/ softmax temperature

Temperature is the knob that controls how random a model's output is. Low temperature (close to 0) makes the model pick the most-likely next token nearly always: factual, deterministic, repetitive. High temperature (1.0+) flattens the probability distribution: more creative, more varied, more prone to wild outputs.

Photo: Maksim Goncharenok / Pexels

Internally, the model produces a probability distribution over its entire vocabulary at each step. Temperature divides those raw probabilities (technically, the logits) before they get normalized. Smaller temperatures make peaks sharper and probable tokens dominate. Larger temperatures flatten the distribution and let unlikely tokens have a real chance.

At temperature 0 the model is fully deterministic: it always picks the highest-probability token. Useful for fact retrieval, code generation, anywhere you want the same answer twice. At temperature 1.0 the model samples in proportion to its trained distribution, which usually feels natural and varied. Above 1.0 the model gets weirder: it starts picking tokens it would normally only pick rarely, which can produce surprising creative writing or surprising nonsense.

Most chat APIs default to somewhere around 0.7 to 1.0 because pure determinism feels mechanical and pure flat-distribution sampling feels unhinged. Different vendors pick slightly different defaults. The values are not universally calibrated: temperature 0.7 on one model is not the same vibe as 0.7 on another.

The mistake people most often make: setting temperature high to "be more creative" without realizing that the model is also more likely to hallucinate facts, lose narrative coherence, or produce nonsensical jumps. Creativity and reliability trade off here. For a brainstorm, high temperature can help. For "what year did this happen," temperature 0 will save you from confidently wrong answers.

Temperature combines with top-p (nucleus sampling) and top-k as the three knobs of stochastic decoding. Each works slightly differently; in practice, modern APIs let you set all three but most tuning happens on temperature alone.

Related concepts

Want the rest?

There are 40 terms total.

See the full glossary