Skip to content

glossary

The terms behind a system that holds up

Plain-English definitions for the reliability, performance, AI, and security concepts I work with every day. Each one links to a free tool that puts the number to work on your own stack.

reliability_scale

Reliability & scale

The targets, budgets, and capacity math behind a service that stays up under load.

performance_delivery

Performance & delivery

How fast a page feels, where the time goes, and how rendering and caching change it.

Cache Hit Ratio

Cache hit ratio is the share of requests served from cache rather than the origin: a higher ratio means less origin load, lower latency, and lower egress and compute bills.

Read definition

Cold Start

A cold start is the extra latency the first request pays when a serverless instance or container has to be created and initialised from nothing before it can serve traffic.

Read definition

Content Delivery Network (CDN)

A CDN is a network of edge servers that cache and serve your content close to users, cutting latency and origin load by answering most requests without a round trip to your servers.

Read definition

Core Web Vitals

Core Web Vitals are Google's three user-experience metrics: Largest Contentful Paint (loading), Interaction to Next Paint (responsiveness), and Cumulative Layout Shift (visual stability).

Read definition

Latency Budget

A latency budget is a total p95 response-time target split across the hops a request takes (network, app, database, cache, third parties) so each layer knows the time it is allowed to spend.

Read definition

p95 Latency

p95 latency is the response time that 95% of requests come in under: a tail-latency measure that, unlike an average, reflects what your slowest and most-affected users actually experience.

Read definition

Rendering Strategies (SSR, SSG, ISR, CSR)

Rendering strategies decide where and when your HTML is built: at build time (SSG), per request on the server (SSR), regenerated on a schedule (ISR), or in the browser (CSR). Each trades freshness, speed, and cost differently.

Read definition

Time to First Byte (TTFB)

Time to First Byte is how long from a request until the first byte of the response arrives: it captures DNS, connection, and server processing, and sets the floor for every page-load metric after it.

Read definition

ai_agents

AI & agents

What you pay for, what the model can see, and what breaks when an LLM gets tools.

Agentic Context Tax

The agentic context tax is the way an AI agent's cost grows faster than its work: every tool call adds a turn, and each turn re-sends the whole conversation, so input tokens scale with roughly the square of the tool calls.

Read definition

Context Window

The context window is the maximum number of tokens a model can consider at once, covering the prompt, any retrieved or conversation history, and the response. Exceed it and the oldest content is dropped or the call fails.

Read definition

Idempotency

An operation is idempotent if running it twice has the same effect as running it once. It is what makes retries safe, so a duplicated request, message, or tool call does not double-charge or double-act.

Read definition

Prompt Injection

Prompt injection is an attack where untrusted text the model reads (a web page, a document, a tool result) contains instructions that hijack the model into ignoring its task or misusing its tools.

Read definition

Retrieval-Augmented Generation (RAG)

RAG is the pattern of fetching relevant documents at query time and putting them in the prompt so the model answers from your data instead of its training, without retraining the model.

Read definition

Tokens (LLM)

A token is the unit a language model reads and writes: a chunk of text (often a word piece) produced by the model's tokenizer. Pricing, context limits, and speed are all measured in tokens, not words or characters.

Read definition

Tool Calling (Function Calling)

Tool calling is how an LLM acts on the world: you describe a set of tools (name, description, parameter schema) and the model chooses which to call and with what arguments, picking entirely from the text of those definitions.

Read definition

Past the definitions, where does your stack actually stand?

I run a fixed-scope review across reliability, performance, cost, and AI readiness, and hand you a prioritized roadmap. Book a call to talk it through.

Book a call