Question 1

How does the Prompt Token and Cost Inspector count tokens?

Accepted Answer

It tokenizes your prompt with a real BPE tokenizer, the o200k encoding used by the GPT-4o family, loaded as a separate chunk on first use. While that loads it shows a quick four-characters-per-token estimate, then swaps in the exact count.

Question 2

Are the token counts accurate for Claude and Gemini too?

Accepted Answer

It uses the o200k count for every model, and Claude and Gemini use their own tokenizers. For typical English and code the counts land within a few percent, so the cost projection is close but not exact for those models.

Question 3

Is my prompt sent anywhere?

Accepted Answer

No. Tokenization and the cost math run entirely in your browser, and the prompt is never sent to a server. The only network fetch is the tokenizer code itself, not your text.

Question 4

Why does it show what percentage of the context window I use?

Accepted Answer

Because a prompt that quietly exceeds the window does not error: it silently truncates, and the model starts ignoring the parts that fell off. The context percentage warns you before that happens, which matters as much as the raw token count.

Question 5

How does it work out cost per call and per month?

Accepted Answer

It multiplies your input tokens and an estimated output length by each model's per-million input and output prices, then scales by your calls per day over 30 days. You can change the output length and call volume to fit your workload.

Model	/ call	/ month	ctx
Claude Haiku	$0.0021	$62.16	<0.1%
Claude Sonnet	$0.0078	$233.10	<0.1%
Claude Opus	$0.04	$1,165.50	<0.1%
GPT-4o mini	$0.00031	$9.41	<0.1%
GPT-4o	$0.0052	$156.75	<0.1%
Gemini 1.5 Flash	$0.00016	$4.70	<0.1%
Gemini 1.5 Pro	$0.0026	$78.38	<0.1%

Prompt Token & Cost Inspector

Every token ships on every call

Questions & answers

Token bill climbing faster than usage?