Kevin Champlin

Glossary

The vocabulary of frontier AI.

Every metric on the chat side rail, every benchmark on the dashboard, every concept in the editorial all link back here. Pedagogy is the spine of this site.

Tokens

tokens

Tokens are the units a language model reads and writes. Roughly four characters of English per token, give or take. Every cost and every limit on this site is denominated in tokens, not words or chara...

token input tokens output tokens

Context window

context-window

The context window is the maximum amount of text, in tokens, that a model can consider in a single request. Everything you send in plus everything the model writes out has to fit. Larger windows enabl...

context length max context context size

Prompt caching

prompt-caching

Prompt caching is a feature that lets the model store a static prefix of your prompt (system instructions, retrieved documents, conversation history) and re-read it at a fraction of the normal input c...

prompt cache cache

Cache read

cache-read

Cache read tokens are input tokens that were served from the prompt cache rather than processed fresh. They are billed at roughly 10 percent of normal input price. A high cache read count on a turn me...

cache read tokens cache_read_input_tokens

Cache creation

cache-creation

Cache creation tokens are input tokens that were written to the prompt cache for reuse on later turns. They cost slightly more than normal input on the way in (roughly 25 percent premium), but every s...

cache creation tokens cache_creation_input_tokens

Stop reason

stop-reason

The stop_reason field in an API response tells you why the model stopped generating. Common values: `end_turn` (the model decided it was done), `max_tokens` (it hit the output cap and was cut off), `s...

stop_reason end reason

Latency to first token

latency-to-first-token

Latency to first token (TTFT) is the time between sending a request and seeing the first token of the response start to stream back. It is what makes a chat interface feel "fast" or "slow" to a user,...

TTFT first token latency time to first token

Hallucination

hallucination

Hallucination is when a model confidently asserts something false. It is not a glitch or a bug; it is the natural output of a system trained to produce plausible text. The model has no internal flag d...

confabulation making things up factual error

Agentic

agentic

"Agentic" describes a system that can take multiple actions in sequence, use tools, observe outcomes, and adjust its plan to reach a goal. It is a descriptor of behavior, not a claim about consciousne...

agent AI agent autonomous agent

Consciousness

consciousness

Consciousness, in this glossary, refers specifically to subjective experience: the property of there being something it is like to be the system. Current AI systems show no evidence of having this pro...

subjective experience sentience awareness

Transformer

transformer

The transformer is the neural network architecture every modern language model is built on. It replaced recurrent networks in 2017 with one core idea: process all tokens of a sequence at once, with an...

transformer architecture decoder-only transformer

Attention

attention

Attention is the mechanism inside a transformer that lets every token "look at" every other token in the sequence and decide how much each one matters. It is what makes language models good at long-ra...

self-attention attention mechanism multi-head attention

Parameters

parameters

A parameter is one number inside the model that gets adjusted during training. A model with "70 billion parameters" has 70 billion such numbers, each tuned by gradient descent to minimize prediction e...

weights model size billions of parameters

Embeddings

embeddings

An embedding is a fixed-length vector of numbers that represents a piece of text in a way that preserves meaning. Two pieces of text with similar meanings produce similar vectors. Used to power semant...

vector embedding text embedding semantic vector

Fine-tuning

fine-tuning

Fine-tuning is the second stage of training a language model. The base model has learned language statistics from a huge corpus; fine-tuning teaches it to follow instructions, adopt a persona, or spec...

SFT supervised fine-tuning instruction tuning

RLHF

rlhf

RLHF (Reinforcement Learning from Human Feedback) is the training step that turns a fluent base model into a helpful, honest, harmless assistant. Humans rank pairs of model responses; a separate "rewa...

reinforcement learning from human feedback preference tuning

Constitutional AI

constitutional-ai

Constitutional AI (CAI) is Anthropic's variant of preference tuning where an AI model critiques and revises its own outputs against a written list of principles ("the constitution"), reducing the need...

CAI RLAIF reinforcement learning from AI feedback

Temperature

temperature

Temperature is the knob that controls how random a model's output is. Low temperature (close to 0) makes the model pick the most-likely next token nearly always: factual, deterministic, repetitive. Hi...

sampling temperature softmax temperature

Top-p

top-p

Top-p (nucleus sampling) restricts the model to picking the next token from only the smallest set of candidates whose cumulative probability adds up to p. With p=0.9, the model considers the top token...

nucleus sampling top_p

Top-k

top-k

Top-k restricts the model to picking the next token from only the k highest-probability candidates. With k=50, the model considers its top 50 guesses and ignores everything else. Older sibling of top-...

top_k sampling

Beam search

beam-search

Beam search is a deterministic decoding strategy that keeps the top-k highest-probability sequences (called "beams") at every step, expanding each, and finally returning the best completed sequence. I...

beam decoding

Logit bias

logit-bias

Logit bias is an API parameter that lets you nudge the model toward or away from specific tokens before sampling happens. Common use: forbid certain tokens entirely (set bias to -100), or strongly pre...

logit_bias token suppression

Retrieval-Augmented Generation (RAG)

rag

RAG is the pattern where, before answering a question, the system retrieves relevant text from an external corpus and gives it to the model as context. The model answers from the retrieved chunks rath...

retrieval augmented generation RAG

Tool use

tool-use

Tool use is the pattern where the model can decide to call an external function (search, code execution, calculator, API) instead of (or alongside) generating prose. The model emits a structured "tool...

function calling tool calling

Extended thinking

extended-thinking

Extended thinking is a mode where the model spends extra inference compute on a hidden internal reasoning trace before producing its visible answer. Trades latency and cost for accuracy on hard proble...

reasoning mode chain of thought thinking budget

Mixture of Experts (MoE)

mixture-of-experts

A Mixture of Experts (MoE) model splits its parameters across many "expert" sub-networks, but at inference only a small subset of them activate per token. Lets you build huge models (trillions of para...

mixture of experts MoE sparse model

Distillation

distillation

Distillation is the process of training a small "student" model to mimic the outputs of a larger "teacher" model. Result: a much smaller, faster, cheaper model that captures most of the teacher's capa...

knowledge distillation model distillation

Quantization

quantization

Quantization is the process of reducing the precision of a model's parameters (e.g. from 32-bit floats down to 8-bit or 4-bit integers) to shrink memory and speed up inference. A 4-bit quantized 70B m...

8-bit quantization 4-bit quantization INT8

Multimodal

multimodal

A multimodal model can process more than one kind of input: text + images, text + audio, sometimes text + video. The model is still a transformer at its core, but its tokenizer has been extended to en...

multi-modal vision-language model VLM

MMLU

mmlu

MMLU (Massive Multitask Language Understanding) is a benchmark of ~16,000 multiple-choice questions across 57 academic subjects: history, biology, law, mathematics, etc. Reported as a percentage. The...

Massive Multitask Language Understanding MMLU benchmark

HumanEval

humaneval

HumanEval is a Python coding benchmark from OpenAI: 164 hand-written programming problems where the model has to write a function that passes a hidden test suite. Reported as "pass@1" (does the first...

HumanEval benchmark code generation benchmark

Evals

evals

Evals are tests for AI models. Anything from a single hand-written prompt to a full benchmark suite of thousands of cases. The discipline of "AI evals" became its own thing in 2023-2024 because labs a...

evaluations eval suite

Calibration

calibration

Calibration is whether a model's expressed confidence matches its actual accuracy. A perfectly calibrated model that says "I'm 80% sure" is right 80% of the time. Modern frontier models are imperfectl...

confidence calibration probability calibration

Benchmark

benchmark

A benchmark is a standardized test for AI models: a fixed dataset of inputs paired with a fixed scoring function. Same test, different models, comparable numbers. The headline score on benchmarks like...

capability benchmark leaderboard

Alignment

alignment

Alignment is the research problem of making AI systems behave in accordance with human values and intentions, especially as systems become more capable. The technical agenda includes RLHF, constitutio...

AI alignment value alignment

Jailbreak

jailbreak

A jailbreak is a prompt or sequence of prompts designed to make a model bypass its safety guardrails: produce content it was trained to refuse, ignore prior instructions, or adopt a persona that disab...

jailbreaking prompt injection bypass

Sycophancy

sycophancy

Sycophancy is the trained tendency of a model to agree with the user's stated views, cave on correct answers when pushed, and tell people what they want to hear. A documented side effect of RLHF: huma...

sycophantic behavior agreement bias

Red teaming

red-teaming

Red teaming is structured adversarial testing: people (or other AI systems) try hard to make a model misbehave. The findings inform safety training, prompt engineering, and deployment decisions. Every...

adversarial testing red team

Inference

inference

Inference is what happens when you actually use a trained model: running input through the network to get output. As distinct from training (which produces the model). Inference is what you pay for pe...

model inference serving

Training

training

Training is the process of adjusting a model's parameters so it predicts the training corpus correctly. The expensive, one-time-ish phase that produces the weights you later run inference on. A fronti...

pre-training model training
Today, UTC
Monthly
refreshed /cost-of-mind →