BYBOWU > Blog > Web development

Cloudflare Containers Pricing: The Real Cost Playbook

blog hero image
Cloudflare flipped Containers CPU billing to active usage on November 21, 2025. Translation: if your workloads idle or burst, you can pay dramatically less—without rewriting everything. This playbook breaks down the exact math, the workloads that win (and lose), and a pragmatic 30‑day plan to capture savings while keeping latency and reliability tight. If you run cron jobs, batch ETL, image handling, webhooks, or AI preprocessing at the edge, this is the cost lever you can pull right now.
📅
Published
Nov 28, 2025
🏷️
Category
Web development
⏱️
Read Time
10 min

Cloudflare Containers pricing just changed in a way that actually helps teams: CPU is now billed on active usage, not on what you provision. If your service spends time waiting on network or spikes only under load, your CPU line item can fall sharply—sometimes by half or more—without touching your architecture. This article explains what changed, why it matters, and how to build an immediate plan to benefit from the switch.

Illustration of edge containers and variable CPU usage

What exactly changed—and what didn’t

As of November 21, 2025, Cloudflare bills Containers and Sandboxes CPU based on active CPU time rather than provisioned capacity. The publicly posted rate remains $0.00002 per vCPU‑second for CPU. Memory ($0.0000025 per GB‑second) and disk ($0.00000007 per GB‑second) are unchanged and still tied to provisioned amounts, not utilization.

A concrete example straight from the docs: previously, 1 vCPU running for one hour cost 1 × 3,600 × $0.00002 = $0.072. With the new model, if the service averages 20% CPU utilization, you pay 20% of that: $0.0144. Same instance type, same hour, lower bill, because you only pay for CPU work actually done. The memory and disk lines do not change under this update.

Cloudflare’s broader “never pay to wait on I/O” philosophy on Workers has been marching this direction for a while. Containers now join that story for CPU. If you’re currently on a mix of Workers and Containers, this change helps standardize your cost mental model across runtimes.

Where Cloudflare shines now (and where it doesn’t)

Here’s the thing: utilization is destiny. If your service spends more time waiting (I/O, queues, APIs) than crunching, paying by active vCPU‑second is great news. If you saturate CPU (e.g., video transcoding, CPU‑bound crypto, heavy image pipelines), the savings may be modest—and you might be memory‑bound anyway.

Winners under the new Cloudflare Containers pricing:

  • Webhook processors with intermittent spikes.
  • Background jobs and cron tasks that run in bursts.
  • Latency‑tolerant ETL and log enrichment with I/O stalls.
  • API frontends that burst on request but mostly await upstreams.

Potential non‑winners:

  • Consistently CPU‑bound workloads (e.g., dense transforms at high QPS).
  • Memory‑heavy services where the provisioned GB‑seconds dominate cost.
  • Large egress flows beyond included transfer; always check your traffic profile.

Reality check: pricing outcomes depend on the blend of CPU, memory, disk, and egress. You’ll want a quick model (below) before declaring victory.

How to do the math for your app

Use this simple, defensible model to forecast monthly costs per service:

CPU = vCPUs × active_seconds × $0.00002
Memory = GB_provisioned × seconds_active × $0.0000025
Disk = GB_provisioned × seconds_active × $0.00000007
Egress = region rate × GB_out (after any included transfer)

Active seconds for CPU means actual CPU work time, not wall‑clock. If you don’t have fine‑grained CPU telemetry yet, estimate first: active_seconds ≈ wall_seconds × avg_CPU_utilization. Then verify with runtime metrics once you turn on measurement.

People also ask: How do I estimate vCPU‑seconds if I lack metrics?

Start with a day’s worth of traffic and a few simple probes:

  1. Run the service with request logging plus execution spans. For each request, capture “CPU busy” vs “waiting” time (e.g., using language runtime profilers or built‑in CPU time counters).
  2. Sample across peak and off‑peak windows. Compute a weighted average utilization.
  3. Multiply your 24‑hour wall time by that utilization to get an estimated active CPU time for the day. Repeat for a week to smooth anomalies.

It’s not perfect, but it gets you within striking distance—and the new billing model is forgiving if your workload is genuinely bursty.

30‑day optimization plan to capture savings

Let’s get practical. Here’s a week‑by‑week plan we’ve used with teams to lock in gains quickly.

Week 0: Baseline and guardrails

  • Turn on application‑level spans to separate CPU busy time from I/O waits.
  • Enable autosleep with a conservative timeout; scale to zero between bursts where safe.
  • Set budget alerts on CPU vCPU‑seconds, GB‑seconds for memory, and egress. Create a Slack/Email alert at 50/75/90% of target.

Week 1: Right‑size and shape demand

  • Right‑size instance types to the smallest memory that avoids swapping or GC thrash; memory is still provisioned billing.
  • Limit concurrency to match actual downstream capacity; avoid amplifying memory footprint during spikes.
  • Batch tiny jobs to reduce cold spins; schedule low‑priority tasks in off‑peak windows.

Week 2: Separate runtimes by job profile

  • Keep I/O‑heavy, short‑lived request code in Workers where “don’t pay to wait” is the default.
  • Use Containers for heavier compute (builds, conversions, non‑latency‑sensitive paths), now cheaper under active CPU billing.
  • Move preprocessing steps (e.g., image resizing hints, tokenization) to the edge nearest request to minimize egress and round trips.

Week 3: Cut CPU burn

  • Cache hot results at the edge keyed by inputs; expire aggressively but avoid recomputation.
  • Profile and remove tiny inefficiencies inside your tight loops; when CPU is billable, micro‑hotspots matter again.
  • Switch to SIMD‑aware or native libraries for transforms where available.

Week 4: Lock in and document

  • Codify scaling policies, autosleep thresholds, and per‑service cost SLOs in your runbooks.
  • Add dashboards that show “$ per 1,000 requests” for CPU, memory, and egress. Review weekly.
  • Schedule a quarterly cost game day to re‑baseline and prevent drift.

If you’d like hands‑on help building this plan, see what our team delivers on cost engineering and platform migration engagements.

Before/after scenarios you can adapt

Scenario A: Webhook processor
Traffic: spiky; average 25% CPU busy during run; 1 vCPU; 1 GB RAM; 12 hours/day active; tiny disk.
CPU: 1 × (12×3,600) × $0.00002 × 0.25 = $0.216/month per day of run → ~ $6.48/month (30 days).
Memory: 1 GB × (12×3,600) × $0.0000025 = $0.108/month per day → ~ $3.24/month.
Disk: negligible (say 1 GB): 1 × (12×3,600) × $0.00000007 ≈ $0.003/month per day → ~ $0.09/month.
Total ≈ $9/month, excluding egress. Under the old CPU policy (provisioned), CPU for the same hours at 1 vCPU would have been 1 × (12×3,600) × $0.00002 = $0.864/month per day → ~$25.92/month. Savings: ~65% on CPU for this service.

Scenario B: CPU‑bound image pipeline
Traffic: steady; 1 vCPU saturated; 2 GB RAM; 24×7.
CPU: 1 × (30×24×3,600) × $0.00002 ≈ $51.84/month (unchanged vs old model since utilization ~100%).
Memory: 2 × (30×24×3,600) × $0.0000025 ≈ $12.96/month.
Outcome: savings depend on improving CPU efficiency or reducing memory, not billing model.

People also ask: Will autoscaling cause surprise bills?

It can—if you don’t set guardrails. Do this:

  • Create per‑service spending SLOs and enforce them with alerts at CPU vCPU‑seconds and memory GB‑seconds thresholds.
  • Set a maximum concurrency per pod/container to limit memory spikes.
  • Keep autosleep timeouts short for stateless services; default to scale‑to‑zero when possible.

Finally, give your on‑call a single dashboard showing both performance and spend so they can trade off safely when incidents hit.

The Goldilocks Grid: choosing Workers vs Containers

Use this quick grid when deciding where code runs:

  • Workers: Ultra‑low latency edges, I/O heavy, short CPU bursts, request/response shaping, auth, caching, simple transforms. You benefit from not paying to wait on I/O.
  • Containers: Longer‑running tasks, language/runtime needs outside Workers constraints, build tools, binaries, CPU work that’s still intermittent. Now charged only for active CPU time.

If AI is in your mix (model serving, embeddings, re‑ranking), keep an eye on how Cloudflare’s ecosystem is evolving. We covered the developer angle when Cloudflare announced the Replicate acquisition in our analysis: how to plan AI builds on Cloudflare.

Memory, disk, and egress still matter—here’s how to win

Because memory and disk pricing remains provisioned, right‑sizing is your best lever:

  • Set memory no higher than your p95 under load plus a 15–25% headroom. Watch for GC pressure and OOMs, then tune.
  • Use ephemeral disk for scratch, keep it small, and push large artifacts to object storage.
  • Co‑locate services and caches to minimize egress; edge‑cache aggressively. If you routinely blow past included transfer, move heavy responses to a CDN path with cache‑hits.

Risk list and edge cases to watch

There’s always a catch or three:

  • High utilization: If your CPU hovers 70–100%, savings are thin. Focus on profiling and algorithmic improvements.
  • Memory creep: Teams lower CPU but leave heap sizes untouched; memory GB‑seconds can quietly dominate.
  • Over‑eager autoscaling: Burst concurrency drives up memory. Cap concurrency, rate‑limit upstream triggers, and batch.
  • Stateful work: If you rely on in‑memory state, aggressive autosleep might hurt. Introduce a fast shared cache or queue to keep state external.

What about GitHub/Vercel/others in your stack?

Cost work is very cross‑platform. If you run previews or frontends elsewhere, make sure those bills don’t backslide as you optimize Cloudflare. We recently broke down how to keep modern hosting predictable in Vercel Pro Pricing: model costs and cut spend, and we maintain a live view on CI changes in GitHub Actions billing: your Dec 1 playbook. Align guardrails and alerts across all vendors—your CFO doesn’t care which invoice surprised you.

People also ask: Should I move everything to Containers now?

No. The best outcomes pair Workers (don’t pay to wait) with Containers (only pay when CPU is busy). Keep request‑path logic, auth, and cache warmers in Workers. Run bursty compute and tooling in Containers. If you’re migrating from a traditional VM or Kubernetes setup, start by moving the bursty, I/O‑heavy services first—that’s where the new pricing lands best.

Team planning Workers and Containers rollout on whiteboard

Action checklist: Ship this in the next two weeks

  • Add CPU busy vs wait spans to your top three edge services.
  • Set autosleep thresholds; verify scale‑to‑zero on non‑critical tasks.
  • Right‑size memory; track GB‑seconds weekly. Aim for a 20% reduction first pass.
  • Move at least one bursty job from VMs/K8s to Containers to validate the new CPU bill.
  • Publish a one‑pager with your spend SLOs and alert thresholds.

Need an outside view or a structured engagement? Reach out via our contacts page, or review our recent platform cost projects to see what similar teams shipped.

Zooming out: why this update matters

Paying for actual CPU work instead of capacity nudges teams toward better architecture: separating latency‑critical I/O from bursty compute, pushing hot paths to the edge, and giving platform teams measurable, controllable KPIs. It aligns finance and engineering without turning every release into a billing debate. With a little instrumentation and a few policy tweaks, you can take advantage of Cloudflare Containers pricing immediately—and keep those gains quarter after quarter.

For deeper analysis of this change and adjacent updates across the ecosystem, see our earlier breakdown: how the pricing switch cuts CPU costs. If you’re mapping multi‑vendor strategy or AI workloads at the edge, our team has shipped those paths too—start here: services overview.

Cost dashboards visualizing CPU, memory, and alerts
Written by Viktoria Sulzhyk · BYBOWU
3,674 views

Work with a Phoenix-based web & app team

If this article resonated with your goals, our Phoenix, AZ team can help turn it into a real project for your business.

Explore Phoenix Web & App Services Get a Free Phoenix Web Development Quote

Get in Touch

Ready to start your next project? Let's discuss how we can help bring your vision to life

Email Us

[email protected]

We typically respond within 5 minutes – 4 hours (America/Phoenix time), wherever you are

Call Us

+1 (602) 748-9530

Available Mon–Fri, 9AM–6PM (America/Phoenix)

Live Chat

Start a conversation

Get instant answers

Visit Us

Phoenix, AZ / Spain / Ukraine

Digital Innovation Hub

Send us a message

Tell us about your project and we'll get back to you from Phoenix HQ within a few business hours. You can also ask for a free website/app audit.

💻
🎯
🚀
💎
🔥