writing

Engineering notes

Production AI and agents, cloud architecture, and what it actually takes to ship products that hold up. No fluff, just what I've learned building and shipping.

June 21, 20263 min read
Why agent costs explode: the quadratic context tax
Every tool-call an LLM agent makes re-sends the whole conversation. That turns a linear-looking feature into a quadratic bill. Here's the math, and how to cut it.
AILLMcostagents
June 21, 20263 min read
Stop reporting uptime. Start spending an error budget.
Uptime percentage is a vanity metric. An error budget turns it into a decision: when to ship features, and when to stop and fix reliability.
reliabilitySRESLO
June 19, 20263 min read
How many instances do you actually need? Little's Law in one afternoon
Most capacity plans are a guess. Little's Law turns requests per second and latency into the fleet size you actually need — no over-provisioning.
capacityscalingperformance
June 17, 20263 min read
A latency budget you can defend in review
A p95 target is meaningless until you divide it up. A latency budget carves it across the hops a request takes, so performance is a number you can defend.
performancelatency
June 14, 20263 min read
Claude vs GPT-4o vs Gemini: a real cost breakdown
List prices hide the real story. Here's how Claude, GPT-4o and Gemini actually compare once you account for context, tool-calls and the work each model gets done per dollar.
AILLMcost
June 7, 20262 min read
Cutting LLM spend: caching, batching & context reuse
Five levers that routinely halve an LLM bill without touching what the product does — prompt caching, batching, context trimming, model routing, and killing redundant calls.
AILLMcostGCP

Engineering notes

Why agent costs explode: the quadratic context tax

Stop reporting uptime. Start spending an error budget.

How many instances do you actually need? Little's Law in one afternoon

A latency budget you can defend in review

Claude vs GPT-4o vs Gemini: a real cost breakdown

Cutting LLM spend: caching, batching & context reuse