Cloudflare Containers pricing just changed in a way that actually helps teams: CPU is now billed on active usage, not on what you provision. If your service spends time waiting on network or spikes only under load, your CPU line item can fall sharply—sometimes by half or more—without touching your architecture. This article explains what changed, why it matters, and how to build an immediate plan to benefit from the switch.

Illustration of edge containers and variable CPU usage

What exactly changed—and what didn’t

As of November 21, 2025, Cloudflare bills Containers and Sandboxes CPU based on active CPU time rather than provisioned capacity. The publicly posted rate remains $0.00002 per vCPU‑second for CPU. Memory ($0.0000025 per GB‑second) and disk ($0.00000007 per GB‑second) are unchanged and still tied to provisioned amounts, not utilization.

A concrete example straight from the docs: previously, 1 vCPU running for one hour cost 1 × 3,600 × $0.00002 = $0.072. With the new model, if the service averages 20% CPU utilization, you pay 20% of that: $0.0144. Same instance type, same hour, lower bill, because you only pay for CPU work actually done. The memory and disk lines do not change under this update.

Cloudflare’s broader “never pay to wait on I/O” philosophy on Workers has been marching this direction for a while. Containers now join that story for CPU. If you’re currently on a mix of Workers and Containers, this change helps standardize your cost mental model across runtimes.

Where Cloudflare shines now (and where it doesn’t)

Here’s the thing: utilization is destiny. If your service spends more time waiting (I/O, queues, APIs) than crunching, paying by active vCPU‑second is great news. If you saturate CPU (e.g., video transcoding, CPU‑bound crypto, heavy image pipelines), the savings may be modest—and you might be memory‑bound anyway.

Winners under the new Cloudflare Containers pricing:

Webhook processors with intermittent spikes.
Background jobs and cron tasks that run in bursts.
Latency‑tolerant ETL and log enrichment with I/O stalls.
API frontends that burst on request but mostly await upstreams.

Potential non‑winners:

Consistently CPU‑bound workloads (e.g., dense transforms at high QPS).
Memory‑heavy services where the provisioned GB‑seconds dominate cost.
Large egress flows beyond included transfer; always check your traffic profile.

Reality check: pricing outcomes depend on the blend of CPU, memory, disk, and egress. You’ll want a quick model (below) before declaring victory.

How to do the math for your app

Use this simple, defensible model to forecast monthly costs per service:

CPU = vCPUs × active_seconds × $0.00002
Memory = GB_provisioned × seconds_active × $0.0000025
Disk = GB_provisioned × seconds_active × $0.00000007
Egress = region rate × GB_out (after any included transfer)

Active seconds for CPU means actual CPU work time, not wall‑clock. If you don’t have fine‑grained CPU telemetry yet, estimate first: active_seconds ≈ wall_seconds × avg_CPU_utilization. Then verify with runtime metrics once you turn on measurement.

30‑day optimization plan to capture savings

Let’s get practical. Here’s a week‑by‑week plan we’ve used with teams to lock in gains quickly.

Week 0: Baseline and guardrails

Turn on application‑level spans to separate CPU busy time from I/O waits.
Enable autosleep with a conservative timeout; scale to zero between bursts where safe.
Set budget alerts on CPU vCPU‑seconds, GB‑seconds for memory, and egress. Create a Slack/Email alert at 50/75/90% of target.

Week 1: Right‑size and shape demand

Right‑size instance types to the smallest memory that avoids swapping or GC thrash; memory is still provisioned billing.
Limit concurrency to match actual downstream capacity; avoid amplifying memory footprint during spikes.
Batch tiny jobs to reduce cold spins; schedule low‑priority tasks in off‑peak windows.

Week 2: Separate runtimes by job profile

Keep I/O‑heavy, short‑lived request code in Workers where “don’t pay to wait” is the default.
Use Containers for heavier compute (builds, conversions, non‑latency‑sensitive paths), now cheaper under active CPU billing.
Move preprocessing steps (e.g., image resizing hints, tokenization) to the edge nearest request to minimize egress and round trips.

Week 3: Cut CPU burn

Cache hot results at the edge keyed by inputs; expire aggressively but avoid recomputation.
Profile and remove tiny inefficiencies inside your tight loops; when CPU is billable, micro‑hotspots matter again.
Switch to SIMD‑aware or native libraries for transforms where available.

Week 4: Lock in and document

Codify scaling policies, autosleep thresholds, and per‑service cost SLOs in your runbooks.
Add dashboards that show “$ per 1,000 requests” for CPU, memory, and egress. Review weekly.
Schedule a quarterly cost game day to re‑baseline and prevent drift.

If you’d like hands‑on help building this plan, see what our team delivers on cost engineering and platform migration engagements.

Before/after scenarios you can adapt

Scenario A: Webhook processor
Traffic: spiky; average 25% CPU busy during run; 1 vCPU; 1 GB RAM; 12 hours/day active; tiny disk.
CPU: 1 × (12×3,600) × $0.00002 × 0.25 = $0.216/month per day of run → ~ $6.48/month (30 days).
Memory: 1 GB × (12×3,600) × $0.0000025 = $0.108/month per day → ~ $3.24/month.
Disk: negligible (say 1 GB): 1 × (12×3,600) × $0.00000007 ≈ $0.003/month per day → ~ $0.09/month.
Total ≈ $9/month, excluding egress. Under the old CPU policy (provisioned), CPU for the same hours at 1 vCPU would have been 1 × (12×3,600) × $0.00002 = $0.864/month per day → ~$25.92/month. Savings: ~65% on CPU for this service.

Scenario B: CPU‑bound image pipeline
Traffic: steady; 1 vCPU saturated; 2 GB RAM; 24×7.
CPU: 1 × (30×24×3,600) × $0.00002 ≈ $51.84/month (unchanged vs old model since utilization ~100%).
Memory: 2 × (30×24×3,600) × $0.0000025 ≈ $12.96/month.
Outcome: savings depend on improving CPU efficiency or reducing memory, not billing model.

The Goldilocks Grid: choosing Workers vs Containers

Use this quick grid when deciding where code runs:

Workers: Ultra‑low latency edges, I/O heavy, short CPU bursts, request/response shaping, auth, caching, simple transforms. You benefit from not paying to wait on I/O.
Containers: Longer‑running tasks, language/runtime needs outside Workers constraints, build tools, binaries, CPU work that’s still intermittent. Now charged only for active CPU time.

If AI is in your mix (model serving, embeddings, re‑ranking), keep an eye on how Cloudflare’s ecosystem is evolving. We covered the developer angle when Cloudflare announced the Replicate acquisition in our analysis: how to plan AI builds on Cloudflare.

Memory, disk, and egress still matter—here’s how to win

Because memory and disk pricing remains provisioned, right‑sizing is your best lever:

Set memory no higher than your p95 under load plus a 15–25% headroom. Watch for GC pressure and OOMs, then tune.
Use ephemeral disk for scratch, keep it small, and push large artifacts to object storage.
Co‑locate services and caches to minimize egress; edge‑cache aggressively. If you routinely blow past included transfer, move heavy responses to a CDN path with cache‑hits.

Risk list and edge cases to watch

There’s always a catch or three:

High utilization: If your CPU hovers 70–100%, savings are thin. Focus on profiling and algorithmic improvements.
Memory creep: Teams lower CPU but leave heap sizes untouched; memory GB‑seconds can quietly dominate.
Over‑eager autoscaling: Burst concurrency drives up memory. Cap concurrency, rate‑limit upstream triggers, and batch.
Stateful work: If you rely on in‑memory state, aggressive autosleep might hurt. Introduce a fast shared cache or queue to keep state external.

What about GitHub/Vercel/others in your stack?

Cost work is very cross‑platform. If you run previews or frontends elsewhere, make sure those bills don’t backslide as you optimize Cloudflare. We recently broke down how to keep modern hosting predictable in Vercel Pro Pricing: model costs and cut spend, and we maintain a live view on CI changes in GitHub Actions billing: your Dec 1 playbook. Align guardrails and alerts across all vendors—your CFO doesn’t care which invoice surprised you.

Action checklist: Ship this in the next two weeks

Add CPU busy vs wait spans to your top three edge services.
Set autosleep thresholds; verify scale‑to‑zero on non‑critical tasks.
Right‑size memory; track GB‑seconds weekly. Aim for a 20% reduction first pass.
Move at least one bursty job from VMs/K8s to Containers to validate the new CPU bill.
Publish a one‑pager with your spend SLOs and alert thresholds.

Need an outside view or a structured engagement? Reach out via our contacts page, or review our recent platform cost projects to see what similar teams shipped.

Zooming out: why this update matters

Paying for actual CPU work instead of capacity nudges teams toward better architecture: separating latency‑critical I/O from bursty compute, pushing hot paths to the edge, and giving platform teams measurable, controllable KPIs. It aligns finance and engineering without turning every release into a billing debate. With a little instrumentation and a few policy tweaks, you can take advantage of Cloudflare Containers pricing immediately—and keep those gains quarter after quarter.

For deeper analysis of this change and adjacent updates across the ecosystem, see our earlier breakdown: how the pricing switch cuts CPU costs. If you’re mapping multi‑vendor strategy or AI workloads at the edge, our team has shipped those paths too—start here: services overview.

Cost dashboards visualizing CPU, memory, and alerts

Cloudflare Containers Pricing: The Real Cost Playbook

What exactly changed—and what didn’t

Where Cloudflare shines now (and where it doesn’t)

How to do the math for your app

People also ask: How do I estimate vCPU‑seconds if I lack metrics?