Fine-tuning

fine-tuning

Also called SFT/ supervised fine-tuning/ instruction tuning

Fine-tuning is the second stage of training a language model. The base model has learned language statistics from a huge corpus; fine-tuning teaches it to follow instructions, adopt a persona, or specialize on a narrow domain by training it further on a smaller, curated dataset of examples.

Photo: cottonbro studio / Pexels

A base model fresh out of pre-training is technically powerful but practically clumsy. It can complete sentences fluently but does not naturally answer questions, follow instructions, or refuse harmful requests. Fine-tuning is what turns that base into a chat-shaped tool.

The standard recipe is supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF). SFT teaches the model the format: here is what a question looks like, here is what an answer looks like, here is what a refusal looks like. RLHF then tunes the model's preferences: of two possible responses, which would a human prefer, and how can we shift the model to produce the preferred one more often. Together, the two stages turn "fluent text completion" into "helpful assistant."

Fine-tuning can also be domain-specific. A general model fine-tuned on legal contracts becomes notably better at legal tasks (and somewhat worse at unrelated ones). A general model fine-tuned on medical literature becomes a clinical assistant. Companies that talk about "their AI" often mean a fine-tuned variant of a foundation model from one of the labs.

There is a cheaper alternative: parameter-efficient fine-tuning (PEFT). Rather than updating all 70 billion parameters of a model, methods like LoRA train only a small set of additional matrices that get added on at inference time. Same effect, fraction of the cost, easier to ship.

Fine-tuning is not the same as in-context learning, where you put examples in the prompt and the model adapts within a single conversation. Fine-tuning changes the weights permanently. In-context learning is just a smart prompt. Fine-tuning is more powerful but more expensive; in-context is free but limited to whatever fits in the context window.

Related concepts

Want the rest?

There are 40 terms total.

See the full glossary