Skip to content

free_tool

Is your Cloud Run service configured right?

A service that deploys fine can still leak secrets in plain env vars, scale to an unbounded bill, or oversubscribe a tiny instance until it OOMs. Paste your gcloud command or service.yaml and get a graded report with the exact flag to change for each gap.

The config you paste runs entirely in your browser. It is never uploaded, sent to a server, or stored. (Anonymous usage metrics, never your config text, are sent to analytics.)

F

Cloud Run config health

30/100

2 to fix · 3 warnings · 1 passed · 6 notes · parsed as gcloud

Grade F, score 30 out of 100, 2 to fix, 3 warnings, 1 passed.

Secrets come from Secret Manager, not plain env vars

high

Secret-looking value(s) set as plain env vars: API_KEY. Env vars on a Cloud Run revision are visible to anyone with run.services.get and are stored in plaintext on the revision. Move them to Secret Manager and mount with --set-secrets so the value is never in the service definition.

--set-secrets=API_KEY=my-api-key:latest,DB_PASSWORD=db-password:latest

max-instances caps runaway cost and quota

high

No max-instances set, so the service can scale to the project default ceiling (100) under load or a traffic flood. A retry storm or a scrape can fan out to the cap and bill every instance. Set an explicit max-instances you can afford, sized from your real peak.

--max-instances=10

Concurrency is tuned to the instance size

medium

Concurrency 80 on a small instance (256Mi memory) oversubscribes it: every in-flight request shares the same CPU and memory, so tail latency climbs and a burst can OOM the instance. Either lower concurrency or give the instance more CPU/memory. Load test to find the real per-instance ceiling.

--concurrency=20   # or raise --cpu / --memory to match the load

CPU allocation matches the workload

medium

CPU is always allocated (--no-cpu-throttling) and min-instances is 1, so you pay for CPU on idle instances around the clock, not just during requests. That's the right combination for background work, streaming, or websockets, but for a request/response API it's pure idle spend. Drop to request-based CPU unless you run work outside of requests.

--cpu-throttling   # bill CPU only during requests

Runs as a least-privilege service account

medium

No service account set, so the service falls back to the default Compute Engine service account, which is broadly privileged (often Editor). Create a least-privilege service account for this service and pass it explicitly.

--service-account=my-service@PROJECT_ID.iam.gserviceaccount.com

Request timeout is bounded

Request timeout is 300s, a sane bound. A request that exceeds it is terminated, which protects the instance from being pinned by a stuck client.

min-instances reflects the cold-start vs idle-cost tradeoff

min-instances is 1, so you keep 1 warm instance to avoid cold starts. That's the right call for latency-sensitive traffic, but those instances bill 24/7 even at zero traffic (and at full per-second rate, not the idle rate, unless CPU is request-based). Confirm the cold-start avoidance is worth the steady cost.

Startup probe gates traffic for slow boots

Startup and liveness probes aren't set on a gcloud run deploy command (there's no deploy flag for them). If the app is slow to boot, define a startup probe so Cloud Run waits for readiness before routing traffic instead of failing the deploy. Probes are configured in the service YAML or with gcloud run services update.

Public vs authenticated access is intentional

--allow-unauthenticated makes the service publicly reachable. That's correct for a public website or API, but make sure it's intentional: an internal service exposed this way is open to the internet. For internal-only traffic, drop the flag and set --ingress=internal.

Ingress is scoped to where traffic should come from

Ingress allows all traffic and the service is public, which is expected for a public site. If this is actually an internal API, set --ingress=internal so only VPC and internal sources can reach it.

Startup CPU boost shortens cold starts

Startup CPU boost is off. It gives a booting instance extra CPU to cut cold-start time. It mainly helps services that scale from zero or boot under CPU pressure, so it's optional when you keep warm instances.

Server binds 0.0.0.0:$PORT

Cloud Run injects the PORT env var (default 8080) and routes traffic to it. Make sure your server reads process.env.PORT and binds 0.0.0.0, not 127.0.0.1, or Cloud Run can't reach it and the revision never becomes ready.

A clean config is the floor, not the ceiling. The autoscaling bounds, the concurrency, the CPU mode and the secrets wiring are where Cloud Run bills and breaks. That's the kind of review I do.

Get your deploy production-ready: book a call

Static analysis of the config text only. There is no YAML parser bundled, so YAML support is targeted: it reads the known Knative keys (containerConcurrency, autoscaling annotations, resource limits, the cpu-throttling and startup-cpu-boost annotations, serviceAccountName, probes and ports). JSON service definitions are parsed in full, and gcloud commands by tokenizing the flags. It runs entirely in your browser and uploads nothing.

why_it_matters

The config is where Cloud Run bills and breaks

Cloud Run bills CPU request-based by default, so a stray --no-cpu-throttling with a warm instance pays around the clock. It scales to a default ceiling of 100 instances when you forget --max-instances, so a retry storm becomes a bill. Plain --set-env-vars stores secrets in plaintext on the revision, and the default compute service account is often Editor on the whole project.

This auditor encodes those platform rules, plus the usual sizing checks (concurrency against CPU and memory, a bounded request timeout, a startup probe for slow boots), so the expensive mistakes get caught before they show up on the invoice or page you.

faq

Questions & answers

What does the Cloud Run Config Auditor check?
It parses a gcloud run deploy command or a Knative Service YAML/JSON and grades it across security (secrets from Secret Manager not plain env vars, a least-privilege service account, scoped ingress), cost (max-instances bound, CPU allocation mode, the min-instances tradeoff, request timeout), and scaling (concurrency tuned to the instance's CPU and memory). Each finding explains why it matters on Cloud Run and gives the exact flag to change.
Why should secrets use Secret Manager instead of --set-env-vars?
Environment variables set with --set-env-vars are stored in plaintext on the Cloud Run revision and are readable by anyone with the run.services.get permission. A real credential there is effectively published to everyone with view access. Mounting it from Secret Manager with --set-secrets keeps the value out of the revision spec, so the auditor fails a config when an env var name or value looks like a secret.
Why does it flag a missing max-instances as high severity?
Without --max-instances, a service can scale up to the project default ceiling of 100 instances under a traffic spike or a retry storm. That turns a bad afternoon into a large bill and floods your database with connections. Setting an explicit max-instances you can afford caps both the cost and the blast radius, which is why the auditor treats it as a high-severity gap.
What's the difference between request-based and always-on CPU billing?
By default Cloud Run allocates CPU only while a request is being handled (request-based), which is the cheapest mode for an API and throttles background work to near zero between requests. Passing --no-cpu-throttling keeps CPU allocated for the instance's whole life so it bills continuously, which is right for websockets, streaming or background work but pure idle spend for a plain request/response service. The tool flags always-on CPU combined with warm instances as likely waste.
Why does high concurrency on a small instance get a warning?
Concurrency is how many requests one instance handles at once, and they all share that instance's CPU and memory. A concurrency of 80 on a 1 vCPU, 256Mi instance oversubscribes it, so tail latency climbs and a burst can push it into an out-of-memory kill. When the auditor can see both the concurrency and a small CPU or memory limit, it warns and suggests either lowering concurrency or giving the instance more resources, confirmed with a load test.
Can it read a YAML service definition, and is anything uploaded?
Yes for both gcloud commands and Cloud Run service definitions. There is no YAML parser bundled, so YAML support is targeted: it reads the specific Knative keys it grades (containerConcurrency, the autoscaling and cpu-throttling annotations, resource limits, serviceAccountName, probes and ports), while JSON definitions are parsed in full. Everything runs in your browser, so nothing you paste is uploaded, sent to a server, or stored.

Want the rest of the deploy looked at?

The config is the floor. I'll review the autoscaling, concurrency, CPU mode, memory limits and secrets wiring that actually decide what Cloud Run costs and whether it stays up. Book a call, or leave your email.

Book a call

No spam. You'll get a reply from me.

Prefer proof first? See how this plays out in real case studies →