Beam search

beam-search

Also called beam decoding

Beam search is a deterministic decoding strategy that keeps the top-k highest-probability sequences (called "beams") at every step, expanding each, and finally returning the best completed sequence. It produces more globally coherent output than greedy decoding by considering multiple paths simultaneously.

Photo: Ishtiak Ahamed / Pexels

Greedy decoding picks the single most probable token at each step and commits. The problem is that the locally best choice might lead down a globally bad path. Beam search hedges: at each step, keep the top-k partial sequences (k is the "beam width"), expand each by one token, score them, and again keep the top-k. After the whole sequence is generated, return the highest-scoring complete beam.

A beam width of 1 is greedy decoding. Wider beams explore more paths and can recover from local mistakes, at the cost of more compute. Beam width 4 to 8 is typical for translation and code generation, where you really do want the highest-likelihood sequence overall.

For chat-style use cases, beam search has fallen out of favor. The output tends to be safe, repetitive, and "averaged" — exactly what you do not want in a creative or conversational setting. Models trained with RLHF + sampling at moderate temperature produce more lifelike text than the same models with beam search.

Beam search remains a strong choice when you genuinely want the highest-likelihood single answer: machine translation (where the target is usually unambiguous), structured output generation (JSON, code, XML), and any situation where exactness matters more than variety.

A subtle issue with beam search is "length bias": longer sequences tend to have lower total log-probability simply because they multiply more terms. Most beam search implementations include a length normalization to avoid the model defaulting to shorter outputs. Without normalization, beam search produces tersely correct answers; with it, you get the more natural full-length response.

For most users of modern chat APIs, beam search is not exposed as a parameter. The vendor has picked a sampling strategy and that is what you get.

Related concepts

Want the rest?

There are 40 terms total.

See the full glossary