glossary

The terms behind a system that holds up

Plain-English definitions for the reliability, performance, AI, and security concepts I work with every day. Each one links to a free tool that puts the number to work on your own stack.

reliability_scale

Reliability & scale

The targets, budgets, and capacity math behind a service that stays up under load.

Availability (the Nines)

Availability is the fraction of time a service is up, usually quoted in 'nines': 99.9% (three nines) is about 43 minutes of downtime a month, 99.99% (four nines) about 4 minutes.

Read definition

Burn Rate

Burn rate is how fast you are spending an error budget relative to spending it evenly: 1x exhausts it exactly at the end of the window, 14.4x exhausts it in about two days.

Read definition

Concurrency

Concurrency is the number of requests a system is handling at the same instant, which sets how many workers, connections, and instances you need, distinct from throughput (requests finished per second).

Read definition

Cost of Downtime

The cost of downtime is what an outage costs you per hour: lost revenue while you are down, plus the engineering time spent firefighting, scaled by how long the outage lasts.

Read definition

Error Budget

An error budget is the amount of failure an SLO permits: the share of requests or minutes you are allowed to lose before you have to stop shipping and fix reliability.

Read definition

Little's Law

Little's Law says the average number of requests in a system equals arrival rate times average time in the system (L = λ × W), which is how you size concurrency from throughput and latency.

Read definition

Service Level Objective (SLO)

A Service Level Objective is the target reliability you commit a service to, written as a number like 99.9% of requests succeeding over a 30-day window.

Read definition

performance_delivery

Performance & delivery

How fast a page feels, where the time goes, and how rendering and caching change it.

Cache Hit Ratio

Cache hit ratio is the share of requests served from cache rather than the origin: a higher ratio means less origin load, lower latency, and lower egress and compute bills.

Read definition

Cold Start

A cold start is the extra latency the first request pays when a serverless instance or container has to be created and initialised from nothing before it can serve traffic.

Read definition

Content Delivery Network (CDN)

A CDN is a network of edge servers that cache and serve your content close to users, cutting latency and origin load by answering most requests without a round trip to your servers.

Read definition

Core Web Vitals

Core Web Vitals are Google's three user-experience metrics: Largest Contentful Paint (loading), Interaction to Next Paint (responsiveness), and Cumulative Layout Shift (visual stability).

Read definition

Latency Budget

A latency budget is a total p95 response-time target split across the hops a request takes (network, app, database, cache, third parties) so each layer knows the time it is allowed to spend.

Read definition

p95 Latency

p95 latency is the response time that 95% of requests come in under: a tail-latency measure that, unlike an average, reflects what your slowest and most-affected users actually experience.

Read definition

Rendering Strategies (SSR, SSG, ISR, CSR)

Rendering strategies decide where and when your HTML is built: at build time (SSG), per request on the server (SSR), regenerated on a schedule (ISR), or in the browser (CSR). Each trades freshness, speed, and cost differently.

Read definition

Time to First Byte (TTFB)

Time to First Byte is how long from a request until the first byte of the response arrives: it captures DNS, connection, and server processing, and sets the floor for every page-load metric after it.

Read definition

ai_agents

AI & agents

What you pay for, what the model can see, and what breaks when an LLM gets tools.

Agentic Context Tax

The agentic context tax is the way an AI agent's cost grows faster than its work: every tool call adds a turn, and each turn re-sends the whole conversation, so input tokens scale with roughly the square of the tool calls.

Read definition

Context Window

The context window is the maximum number of tokens a model can consider at once, covering the prompt, any retrieved or conversation history, and the response. Exceed it and the oldest content is dropped or the call fails.

Read definition

Idempotency

An operation is idempotent if running it twice has the same effect as running it once. It is what makes retries safe, so a duplicated request, message, or tool call does not double-charge or double-act.

Read definition

Prompt Injection

Prompt injection is an attack where untrusted text the model reads (a web page, a document, a tool result) contains instructions that hijack the model into ignoring its task or misusing its tools.

Read definition

Retrieval-Augmented Generation (RAG)

RAG is the pattern of fetching relevant documents at query time and putting them in the prompt so the model answers from your data instead of its training, without retraining the model.

Read definition

Tokens (LLM)

A token is the unit a language model reads and writes: a chunk of text (often a word piece) produced by the model's tokenizer. Pricing, context limits, and speed are all measured in tokens, not words or characters.

Read definition

Tool Calling (Function Calling)

Tool calling is how an LLM acts on the world: you describe a set of tools (name, description, parameter schema) and the model chooses which to call and with what arguments, picking entirely from the text of those definitions.

Read definition

security_seo

Security & SEO

The headers, tokens, and markup that decide how safe a page is and how it is read.

Content Security Policy (CSP)

A Content Security Policy is an HTTP header that tells the browser which sources of scripts, styles, images, and other content it is allowed to load, which is the strongest defence against cross-site scripting.

Read definition

JSON Web Token (JWT)

A JWT is a compact, signed token (header, payload, signature) that carries claims like who the user is and when the token expires, so a server can verify a session without a database lookup.

Read definition

Structured Data (JSON-LD / Schema.org)

Structured data is machine-readable markup, usually JSON-LD following schema.org vocabulary, that describes what a page is about so search engines can understand it and show rich results.

Read definition

Past the definitions, where does your stack actually stand?

I run a fixed-scope review across reliability, performance, cost, and AI readiness, and hand you a prioritized roadmap. Book a call to talk it through.