Kevin Champlin

State of the Art

The state of frontier AI,
with the receipts attached.

What every frontier model costs, how much it can read in one go, and how the curve has bent over the last eighteen months. Numbers come from a `model_snapshots` table refreshed against vendor docs, never hardcoded in the views.

snapshot2026-05-06 vendors3 tracked8 current cheapest input$0.0750/Mtok
Frontier models tracked
8
across 3 vendors
Cheapest input
$0.075
Gemini 3 Flash / Mtok
Largest context
2.0M
Gemini 3 Pro tokens
Cache hit savings
~90%
cost reduction vs fresh input

Cost per million input tokens, over time

Capability went up. Price went down.

Each line is a vendor's frontier and budget tiers. Y axis is log scale (cost spans two orders of magnitude). Read the slope, not the absolute numbers.

$0.10 $1.00 $10.00 2023 2024 2025 2026 GPT-4 — $30.0000/Mtok input (2023-03-14) GPT-4o — $2.5000/Mtok input (2024-05-13) GPT-5 — $2.5000/Mtok input (2025-11-15) GPT-5 Mini — $0.1500/Mtok input (2025-12-05) GPT-5 Turbo — $0.5000/Mtok input (2026-02-10) Claude 3.5 Sonnet — $3.0000/Mtok input (2024-06-20) Claude 3.5 Haiku — $1.0000/Mtok input (2024-10-22) Claude Sonnet 4.6 — $3.0000/Mtok input (2025-09-15) Claude Haiku 4.5 — $0.8000/Mtok input (2025-10-01) Claude Opus 4.7 — $15.0000/Mtok input (2026-01-20) Gemini 2 Flash — $0.0750/Mtok input (2025-02-05) Gemini 2 Pro — $1.2500/Mtok input (2025-02-15) Gemini 3 Pro — $1.2500/Mtok input (2025-12-15) Gemini 3 Flash — $0.0750/Mtok input (2026-01-12)
Anthropic
OpenAI
Google

Context window, current frontier

The page got bigger. A lot bigger.

What the model can hold in mind on a single request. Eighteen months ago, 32K was generous. Today, 1M is normal and 2M is in production.

Gemini 3 Pro
google
2.0M
Claude Opus 4.7
anthropic
1.0M
Gemini 3 Flash
google
1.0M
GPT-5
openai
256K
Claude Sonnet 4.6
anthropic
200K
Claude Haiku 4.5
anthropic
200K
GPT-5 Turbo
openai
128K
GPT-5 Mini
openai
128K

Current frontier and tiers

Side-by-side, with the bill.

Input price, output price, cache-read price, context window, and a few public benchmarks where vendors publish them. Use this to pick the right model for the job, not the loudest one.

Model Vendor Context In / Mtok Out / Mtok Cache read MMLU
Claude Opus 4.7
claude-opus-4-7
anthropic 1.0M $15 $75 $1.5 92.1
Claude Sonnet 4.6
claude-sonnet-4-6
anthropic 200K $3 $15 $0.3 89.3
Claude Haiku 4.5
claude-haiku-4-5
anthropic 200K $0.8 $4 $0.08 82.1
Gemini 3 Pro
gemini-3-pro
google 2.0M $1.25 $5 90.8
Gemini 3 Flash
gemini-3-flash
google 1.0M $0.075 $0.3 81.7
GPT-5
gpt-5
openai 256K $2.5 $10 $0.25 91.4
GPT-5 Turbo
gpt-5-turbo
openai 128K $0.5 $1.5 $0.05 86.2
GPT-5 Mini
gpt-5-mini
openai 128K $0.15 $0.6 $0.015 78.4
snapshot 2026-05-06 / prices in USD per million tokens / cache-read shown when vendor publishes it

The number that ate the bill

Prompt caching turns the system prompt into a 10% line item.

On the second turn of any conversation, the long static preamble (system prompt, persona, retrieved corpus) reads from the prompt cache at roughly 10% of fresh-input price. On a long session, that quietly cuts your input bill by 70 to 90 percent.

Read the full glossary entry
Fresh input
100%
Cache write
125%
Cache read
10%

Want to see what these numbers mean in practice?

Talk to one. Watch the meter.

Today, UTC
Monthly
refreshed /cost-of-mind →