free_tool
Prompt Token & Cost Inspector
Paste a prompt and see exactly what it costs before you ship it: an exact token count from a real BPE tokenizer, the price per call and per month across Claude, GPT-4o and Gemini, and how much of each model's context window it eats. It runs in your browser, so the prompt never leaves the page.
Input tokens (estimate)
90
64 words · 357 chars · 18% whitespace
| Model | / call | / month | ctx |
|---|---|---|---|
| Claude Haiku | $0.0021 | $62.16 | <0.1% |
| Claude Sonnet | $0.0078 | $233.10 | <0.1% |
| Claude Opus | $0.04 | $1,165.50 | <0.1% |
| GPT-4o mini | $0.00031 | $9.41 | <0.1% |
| GPT-4o | $0.0052 | $156.75 | <0.1% |
| Gemini 1.5 Flash | $0.00016 | $4.70 | <0.1% |
| Gemini 1.5 Pro | $0.0026 | $78.38 | <0.1% |
Tokenized with o200k (GPT-4o family); other models tokenize within a few percent. List prices per 1M tokens, captured 2026-06. "ctx" = prompt as a share of the model's context window.
Prompt ballooning your bill or blowing the context window? I cut token footprint with caching, retrieval, and tighter prompts without losing quality.
Trim your token bill: book a callTokenized with o200k (the GPT-4o family encoding). Claude and Gemini use their own tokenizers, but for typical English and code the counts land within a few percent, which is close enough to budget and to compare prompts.
why_it_matters
Every token ships on every call
A prompt you write once runs millions of times. A bloated system prompt or an over-stuffed few-shot block isn't a one-time cost; it's a tax on every request, paid in latency and dollars for the life of the feature. Seeing the monthly number next to the token count is usually what makes teams trim.
Context fit matters just as much. A prompt that quietly grows past the window doesn't error politely; it truncates, and the model starts ignoring the instructions you thought were guaranteed. Watch the context column as your retrieved context and history grow.
faq
Questions & answers
- How does the Prompt Token and Cost Inspector count tokens?
- It tokenizes your prompt with a real BPE tokenizer, the o200k encoding used by the GPT-4o family, loaded as a separate chunk on first use. While that loads it shows a quick four-characters-per-token estimate, then swaps in the exact count.
- Are the token counts accurate for Claude and Gemini too?
- It uses the o200k count for every model, and Claude and Gemini use their own tokenizers. For typical English and code the counts land within a few percent, so the cost projection is close but not exact for those models.
- Is my prompt sent anywhere?
- No. Tokenization and the cost math run entirely in your browser, and the prompt is never sent to a server. The only network fetch is the tokenizer code itself, not your text.
- Why does it show what percentage of the context window I use?
- Because a prompt that quietly exceeds the window does not error: it silently truncates, and the model starts ignoring the parts that fell off. The context percentage warns you before that happens, which matters as much as the raw token count.
- How does it work out cost per call and per month?
- It multiplies your input tokens and an estimated output length by each model's per-million input and output prices, then scales by your calls per day over 30 days. You can change the output length and call volume to fit your workload.
Token bill climbing faster than usage?
I'll find where the tokens go (bloated prompts, context you can cache, retrieval you can tighten) and cut the bill without losing answer quality. Book a call, or leave your email.
Prefer proof first? See how this plays out in real case studies →