Frontier AI, with the cost meter on

Talk to a frontier AI.
See exactly what it costs.

An interactive tour of how production AI really works. Open the chat, ask anything, and watch input tokens, output tokens, and dollar cost climb in real time. Then browse the glossary, scan capabilities, or read the essays.

Talk to the AI Browse the glossary

live

Talk to a frontier AI. Ask anything. Watch it think, watch it fail, watch the meter run.

connecting · streaming

thinking

▍

contacting model

Live. 10 turns / day, hard cap on the wallet.

Who's behind the meter

A working AI engineer.
Not an AI influencer.

I'm Kevin Champlin. I started writing code in 1998 and I've spent twenty years shipping production systems, the last few going deep on applied AI. I write the code, I deploy it, and I answer the alerts at 2am. I don't advise on what AI other people should build. I build it, ship it, and stay accountable for what runs.

The proof is the page you're on. The live token meter, the per-visitor budget gate, the prompt caching, the streaming chat, the forty-term glossary, all of it is hand-built and running on a real wallet with a hard cap. This isn't a deck about AI. It's a working AI system that narrates its own cost.

rolesenior AI engineer since1998 modelsClaude · GPT · Gemini

The full story Hire me

Shipped, not slideware

Production AI I've built and run.

full portfolio →

Chat Ironcrest

RAG knowledge engine

Turns messy PDFs, SOPs, and web pages into a grounded knowledge base. Answers only from approved sources, with timestamped citations. Per-client isolation and audit trails.

RAG Embeddings Citations

AI Enrollment Assistant

Grounded RAG

Answers prospective-student questions strictly from the official catalog, shows its sources, and hands off to a human when it cannot answer. Built on Gemini / Vertex AI.

Gemini Vertex AI Guardrails

The Mirror

Web + native iOS

An AI reflection app with long-term memory and pattern recognition. Built HIPAA-compliant with crisis-language detection and optional clinician sharing.

Memory iOS HIPAA

Vantage AI

Self-tuning ensemble

A three-model ensemble (momentum, mean-reversion, sentiment) for equity analysis, with weights that adapt to accuracy every night.

Ensemble Self-tuning Laravel

Diamond AI

Auto-learning pipeline

An outcome-prediction engine that grades itself hourly and retunes overnight. The evaluation loop is the product.

Evals Auto-grade Pipeline

This site

you're looking at it

Live cost telemetry, per-visitor budget gating, prompt caching, and token-by-token streaming over Reverb websockets. Cost engineering you can watch.

Streaming Caching Budget gate

Career Readiness Report

Client: InternBridge

Generates consistent, personalized career-readiness feedback for students from their experiential-learning data. Reliable, templated AI output at scale.

LLM Templates Higher-ed

Claude Skills

Open source

MIT-licensed Claude Code skills, published for anyone to use. Tooling for the way I actually work alongside coding agents.

OSS Agents Claude

The AI engineering toolkit

The parts nobody puts in a demo.

Anyone can wrap an API call. The hard, unglamorous, important work is everything around it: making answers trustworthy, cheap, and safe. That is where I live.

RAG & retrieval

Chunking that respects document structure, embeddings, vector search, and grounded answers that cite their exact source. If the answer is not in the corpus, the system says so instead of guessing.

Evaluation harnesses

Self-tuning eval loops, shadow-compare grading against ground truth, and regression catches. In high-stakes AI the eval set is the product. Most teams skip it. I start there.

Multi-provider orchestration

Claude, OpenAI, and Gemini behind one interface, routed per task. Vendor-agnostic by design, so no single model outage or price hike owns the roadmap.

Cost & latency engineering

Prompt caching, token budgeting, and per-request cost telemetry. The meter in the corner of this page is this exact skill, running live on a real wallet.

Agentic systems

Tool use, function calling, and multi-step autonomous loops, with the guardrails and kill-switches that keep "autonomous" from quietly meaning "unaccountable."

Safety & governance

Refuse-to-guess behavior, PII handling, audit logging, multi-tenant isolation, and HIPAA-aware data flows. Built in from the first commit, not bolted on later.

Want the deep version? The glossary defines every term above, and the essays put them to work.

Three things this site is

A demo, a glossary, and an honest meter.

Talk to it

Hit the chat. Watch the response stream in. See input tokens, output tokens, and cost increment in real time. Every turn ships you a receipt.

to the chat →

Learn the words

Tokens. Cache reads. Stop reasons. Hallucination versus confabulation. The glossary turns every metric on the screen into a teaching moment.

browse the glossary →

Watch the meter

The corner pill shows today's research budget in real time. Hard cap. No surprises. The site refuses to hide what it costs to run.

see /cost-of-mind →

From the glossary

The vocabulary of frontier AI.

all 40 terms →

Tokens

tokens

Tokens are the units a language model reads and writes. Roughly four characters of English per token, give or take. Every cost and every limit on this site is d...

Context window

context-window

The context window is the maximum amount of text, in tokens, that a model can consider in a single request. Everything you send in plus everything the model wri...

Prompt caching

prompt-caching

Prompt caching is a feature that lets the model store a static prefix of your prompt (system instructions, retrieved documents, conversation history) and re-rea...

Cache read

cache-read

Cache read tokens are input tokens that were served from the prompt cache rather than processed fresh. They are billed at roughly 10 percent of normal input pri...

Cache creation

cache-creation

Cache creation tokens are input tokens that were written to the prompt cache for reuse on later turns. They cost slightly more than normal input on the way in (...

Stop reason

stop-reason

The stop_reason field in an API response tells you why the model stopped generating. Common values: `end_turn` (the model decided it was done), `max_tokens` (it...

Lab Notes

What's being built, in the open.

The site is the demo, including the construction site itself. Every shipped milestone, every cap tuning, every cost lesson goes here.

SHIPPED

Schema, models, glossary seed

14 tables migrated, 14 Eloquent models, glossary seeder + 10 seed terms covering side-rail metadata and editorial-spine concepts.

2026-05-06