State of the Art

The state of frontier AI,
with the receipts attached.

What every frontier model costs, how much it can read in one go, and how the curve has bent over the last eighteen months. Numbers come from a `model_snapshots` table refreshed against vendor docs, never hardcoded in the views.

snapshot2026-05-06 vendors3 tracked8 current cheapest input$0.0750/Mtok

Frontier models tracked

across 3 vendors

Cheapest input

$0.075

Gemini 3 Flash / Mtok

Largest context

2.0M

Gemini 3 Pro tokens

Cache hit savings

~90%

cost reduction vs fresh input

Cost per million input tokens, over time

Capability went up. Price went down.

Each line is a vendor's frontier and budget tiers. Y axis is log scale (cost spans two orders of magnitude). Read the slope, not the absolute numbers.

Anthropic

OpenAI

Google

Context window, current frontier

The page got bigger. A lot bigger.

What the model can hold in mind on a single request. Eighteen months ago, 32K was generous. Today, 1M is normal and 2M is in production.

Gemini 3 Pro

google

2.0M

Claude Opus 4.7

anthropic

1.0M

Gemini 3 Flash

google

1.0M

GPT-5

openai

256K

Claude Sonnet 4.6

anthropic

200K

Claude Haiku 4.5

anthropic

200K

GPT-5 Turbo

openai

128K

GPT-5 Mini

openai

128K

Current frontier and tiers

Side-by-side, with the bill.

Input price, output price, cache-read price, context window, and a few public benchmarks where vendors publish them. Use this to pick the right model for the job, not the loudest one.

Model	Vendor	Context	In / Mtok	Out / Mtok	Cache read	MMLU
Claude Opus 4.7 claude-opus-4-7	anthropic	1.0M	$15	$75	$1.5	92.1
Claude Sonnet 4.6 claude-sonnet-4-6	anthropic	200K	$3	$15	$0.3	89.3
Claude Haiku 4.5 claude-haiku-4-5	anthropic	200K	$0.8	$4	$0.08	82.1
Gemini 3 Pro gemini-3-pro	google	2.0M	$1.25	$5	—	90.8
Gemini 3 Flash gemini-3-flash	google	1.0M	$0.075	$0.3	—	81.7
GPT-5 gpt-5	openai	256K	$2.5	$10	$0.25	91.4
GPT-5 Turbo gpt-5-turbo	openai	128K	$0.5	$1.5	$0.05	86.2
GPT-5 Mini gpt-5-mini	openai	128K	$0.15	$0.6	$0.015	78.4

snapshot 2026-05-06 / prices in USD per million tokens / cache-read shown when vendor publishes it

The number that ate the bill

Prompt caching turns the system prompt into a 10% line item.

On the second turn of any conversation, the long static preamble (system prompt, persona, retrieved corpus) reads from the prompt cache at roughly 10% of fresh-input price. On a long session, that quietly cuts your input bill by 70 to 90 percent.

Read the full glossary entry

Fresh input

100%

Cache write

125%

Cache read

10%

Want to see what these numbers mean in practice?

Talk to one. Watch the meter.

Start a conversation Browse the glossary

The state of frontier AI, with the receipts attached.

Capability went up. Price went down.

The page got bigger. A lot bigger.

Side-by-side, with the bill.

Prompt caching turns the system prompt into a 10% line item.

Talk to one. Watch the meter.

The state of frontier AI,
with the receipts attached.