BYBOWU > Blog > Web development

Cloudflare Containers Pricing Switch: Cut CPU Costs Now

blog hero image
Cloudflare just flipped how Containers and Sandboxes charge for CPU. Instead of paying for provisioned capacity, you’re billed by active vCPU-seconds. Translation: if your workloads are bursty or I/O-heavy, your bill can drop fast—if you tune a few knobs. This piece breaks down the math, shows what actually changed, and gives you a practical, 60‑minute playbook to lock in savings without risking downtime. We’ll cover instance sizing, concurrency, autosleep, and the observability you n...
📅
Published
Nov 24, 2025
🏷️
Category
Web development
⏱️
Read Time
11 min

Cloudflare Containers pricing just shifted in your favor: CPU is now billed on active usage instead of provisioned capacity. If you’ve been paying for a 1 vCPU instance that idles most of the hour, you finally stop lighting money on fire. In this guide, I’ll translate the change into real numbers, show how it affects Sandboxes as well, and walk you through a no-drama checklist to capture savings without breaking production.

Illustration of CPU utilization decreasing on a cloud dashboard

What exactly changed in Cloudflare Containers pricing?

Previously, CPU charges effectively mapped to the capacity you allocated. If you ran a standard-2 container (up to 1 vCPU) for an hour, you were on the hook for that hour of vCPU time, even if your service used only a fraction of it. Now, CPU cost tracks actual work performed, measured in vCPU-seconds. Cloudflare’s published CPU rate is $0.000020 per vCPU-second, and you’re billed for the vCPU-seconds your container consumes—not the ceiling you provisioned.

Memory and disk pricing did not change. Those remain charged on provisioned resources. Egress pricing is unchanged too, with generous included bandwidth and low regional rates. That means the savings lever here is CPU utilization, full stop.

Quick math: before vs. after

Consider a standard-2 instance (1 vCPU) running one hour:

  • Old model (pay for allocation): 1 vCPU × 3600s × $0.000020 = $0.072 CPU cost.
  • New model (pay for usage): If your average CPU utilization is 20%, cost becomes 0.2 × $0.072 = $0.0144.

At 50% utilization, that’s roughly $0.036. If you sometimes spike to 100% but idle most of the hour, your true effective rate lands somewhere in between. This is particularly friendly to I/O-bound APIs, event-driven services, and agent-style workloads that burst then wait.

Cloudflare Containers pricing: details that matter

Let’s anchor on the concrete bits you’ll need for modeling:

  • CPU: $0.000020 per vCPU-second, measured by active usage.
  • Memory: charged by GiB-seconds, with included monthly memory on paid plans.
  • Disk: charged by GB-seconds; unchanged in this switch.
  • Egress: region-based rates with included allotments (e.g., NA/EU commonly $0.025/GB beyond the included quota).
  • Included CPU: paid plans include a baseline (e.g., hundreds of vCPU-minutes) that offsets small workloads.

Instance types still cap the maximum vCPU, memory, and disk per instance—think of them as guardrails for performance and concurrency. You size instances for peak needs, but you now pay CPU for work done, not peak budget.

Which workloads benefit most?

Three patterns win immediately:

  • I/O-bound APIs that await upstreams (databases, third-party HTTP) far more than they chew CPU.
  • Event and queue consumers that burst in short spurts and then idle.
  • Agentic or cron-like tasks with uneven duty cycles—periodic heavy compute, long periods of waiting.

Compute-heavy services with sustained 80–100% CPU won’t see dramatic cuts; they’ll just get a fairer, more predictable mapping between work performed and what you pay. And if you’ve historically oversized instances for headroom, that’s okay—oversizing no longer penalizes CPU cost directly. It can still affect memory and disk charges, so right-size thoughtfully.

How the change affects Cloudflare Sandboxes

Sandboxes follow the same rule change for CPU: usage-based instead of allocation-based. If you rely on Sandboxes for secure execution of untrusted code or per-tenant isolation, you get the same utilization-driven savings profile. The same tuning guidance—measuring actual CPU, capping burst concurrency, and disabling busy loops—applies here too.

Modeling your new bill in 10 minutes

Grab last week’s metrics and a calculator. You only need three inputs per service:

  1. Avg CPU utilization during typical hours (or percentile bands if you’re fancy).
  2. Instance type vCPU ceiling (e.g., 1 vCPU for standard-2).
  3. Active runtime per hour (seconds the container is actually live).

CPU cost formula now becomes: vCPU ceiling × seconds live × utilization × $0.000020. If your container is live for 3600s and averages 18% CPU, with 1 vCPU ceiling: 1 × 3600 × 0.18 × 0.000020 ≈ $0.013 for that hour. Roll it up by hours and instances, then subtract included CPU minutes to estimate your net.

People also ask: Does this change memory or disk costs?

No. Memory and disk still bill from provisioned levels. If you oversize your instance type mostly for memory, you’ll continue paying for that memory footprint regardless of CPU utilization. The trick is balancing enough memory to avoid swapping or OOMs while not climbing instance types unnecessarily.

People also ask: What about egress—could it erase the CPU savings?

It can if your workload is bandwidth-heavy and operating well past the included allotment. For typical API or worker patterns, egress is modest. For media, data export, or AI inference with chunky payloads, egress will likely dominate. Always check your regional rate and included TB/GB before declaring victory.

Let’s get practical: a 60-minute optimization playbook

You don’t need a six-week project to benefit. Here’s a one-hour plan I’ve used on real workloads to capture savings fast:

0–15 minutes: Instrument the right signals

  • Enable per-instance CPU usage metrics. Track average and p95 across business hours and off-peak.
  • Record live seconds per container (how long the instance is actually running) to spot sleep opportunities.
  • Log concurrency per route or queue consumer. Tie spikes to specific traffic sources.

15–30 minutes: Kill accidental CPU burn

  • Replace busy loops with event hooks, timers, or backoff. Polling every 50ms adds up fast.
  • Audit JSON parsing and compression on hot paths; use streaming parsers and sane compression levels.
  • Trim cryptography overhead by caching verified keys and reusing contexts where safe.

30–45 minutes: Right-size and cap

  • Select an instance type for peak load, not average, then cap concurrency to avoid thrash. Let the platform scale instances horizontally.
  • Enable or shorten autosleep timeouts on low-traffic services. Scale-to-zero is your friend now.
  • Move non-critical CPU from request paths to background jobs or scheduled work.

45–60 minutes: Validate and lock

  • Run a load test mirroring real traffic. Verify p95 latency and error rates are unchanged or better.
  • Set spend alerts by service and account. If your org is new to usage-based CPU, finance will appreciate early warnings.
  • Ship a weekly utilization report to stakeholders: CPU %, runtime seconds, and projected cost deltas.

Choosing instance types under usage-based CPU

With CPU now usage-billed, the instance type choice is more about performance ceilings and memory fit than direct CPU dollars. A few heuristics:

  • Latency-sensitive APIs: Prefer higher vCPU ceilings (standard-2 or above) if you need consistent single-thread headroom; you won’t pay extra CPU unless you use it.
  • Memory-bound services: Choose the smallest type that comfortably accommodates your live set plus peak allocations. Avoid tail OOMs or garbage collector thrash.
  • Queue consumers: Set per-instance concurrency to a safe upper bound and scale horizontally. Horizontal scaling raises live seconds, but each instance stays efficient and responsive.
Developer reviewing cloud cost dashboard on laptop

Gotchas that can nuke your savings

There’s always a catch or three:

  • Sticky busywork: Cron tasks that run every minute and do 200ms of compute can keep instances from sleeping. Batch them or increase intervals.
  • Chatty retries: Over-aggressive retry policies produce burst CPU with no business value. Apply jittered backoff and idempotency keys.
  • Warmers: Custom “keep warm” pingers to fight cold starts can backfire by keeping CPU and live seconds high. Re-evaluate now—cold starts may be cheaper than burn.
  • Misplaced compression: Compressing tiny JSON responses can cost more CPU than the bandwidth you save. Thresholds matter.
  • Debug logging: Verbose logging in hot paths wastes CPU. Keep it at info level with selective sampling.

How to communicate the change to Finance

Executives don’t want a whitepaper; they want a simple storyline. Here’s one I’ve used:

“Cloudflare changed Containers CPU from allocation to usage on November 21, 2025. Our average utilization on customer APIs is 22%, so the CPU portion of those services should fall by ~70–80%. Memory, disk, and egress are unchanged. We’ll ship an autosleep and concurrency cap today to lock the gains. We’ve set alerts if usage spikes.”

If you need help connecting the dots between infra and business outcomes, our team at Bybowu Services does this weekly for engineering orgs and FinOps teams.

A simple calculator you can copy

Per service, open a spreadsheet and add columns:

  1. Instance vCPU ceiling (e.g., 1, 2, 4).
  2. Live seconds per hour (3600 if always on; otherwise sum per instance).
  3. Avg CPU utilization (0–1).
  4. CPU price ($0.000020).
  5. CPU cost = vCPU × live seconds × utilization × price.

Optionally break into peak and off-peak bands with different utilization. Subtract included CPU minutes for the plan you’re on, and add memory/disk/egress line items as needed. If you want a deeper walkthrough, see our earlier piece Cloudflare Containers Pricing Just Changed: What to Fix Now where we model common scenarios.

Operational guardrails to add this week

  • CPU budgets per route/job: Define target and max CPU per request. Alert when a route exceeds the budget.
  • Concurrency controls: Prevent bursty fan-out from running a hundred expensive handlers simultaneously.
  • Autosleep policies tuned by service class: customer-facing APIs vs. internal batch jobs.
  • CI checks for accidental CPU spikes: microbench hot paths after big merges, not just unit tests.

What about Node.js, Rust, Python differences?

Language choice still matters for CPU efficiency. A few practical notes I see in teams:

  • Node.js: Single-threaded by default. Avoid synchronous crypto/compression on the request path; push to workers or background. Use streaming parsers for big payloads.
  • Rust/Go: Great for compute-heavy primitives you call from higher-level services. Consider isolating hot loops behind a small internal service to contain CPU cost.
  • Python: Be mindful of per-request CPU in data munging. Vectorize with C-backed libs or push compute into a compiled microservice.

People also ask: Should we downsize instance types now?

Maybe. If you oversized purely for CPU headroom, you can keep the headroom without paying extra CPU unless used. But if your instance type is mostly about memory, downsizing could cause GC churn or OOMs that erase savings via retries and timeouts. Measure first; change once.

People also ask: How do we keep finance from getting surprised?

Turn on spend alerts and send a weekly CPU utilization digest. Teams that proactively brief Finance earn trust—and more headroom when they need it. If you want a template, reach out via Bybowu Contacts; we’ll share a one-pager we use with clients.

When to reconsider architecture

If you’re doing sustained compute—transcoding, large-model inference, or heavy analytics—the usage switch won’t make those workloads cheap; it just makes them fair. For those, consider offloading to specialized services or dedicating a separate tier with predictable batch windows. Keep your customer-facing APIs lean and I/O-bound where possible.

The bottom line

This Cloudflare Containers pricing change rewards good engineering hygiene. If you reduce wasteful CPU, keep instances asleep when idle, and right-size wisely, your invoice drops. If you ignore busy loops, retry storms, and chatty compression, usage billing will expose the inefficiency. That’s a feature. Treat it as a scoreboard and improve.

What to do next

  • Today: Add CPU utilization, live seconds, and concurrency metrics. Enable spend alerts.
  • This week: Remove busy loops, tune autosleep, and set per-route CPU budgets.
  • This month: Revisit instance types for memory fit; move heavy compute off the request path.
  • Quarterly: Review egress-heavy flows; negotiate architecture changes if bandwidth dominates cost.

Want a fast sanity check on your environment? We help teams model cost and performance tradeoffs and implement the fixes without drama. Explore what we do or browse our engineering blog for more playbooks, including npm token changes and other “do-this-now” infrastructure updates.

FAQ: quick hits for busy teams

Is this change live?

Yes—Cloudflare announced the switch to usage-based CPU pricing for Containers and Sandboxes on November 21, 2025. If you’re on paid plans, start measuring and you’ll see the effect in your next billing cycle.

Do I need to change code to benefit?

No. But you’ll benefit more if you reduce wasteful CPU (polling, redundant parsing, aggressive retries) and allow instances to sleep when idle.

Will scaling horizontally increase my CPU bill?

Only if total work done increases. Spreading the same work across more instances doesn’t increase CPU cost by itself; it may even reduce it if it cuts lock contention and lowers per-request CPU.

How do I avoid cold start penalties while still saving?

Use sensible autosleep thresholds, keep dependencies lean, and precompute heavy artifacts. If you absolutely need warm instances, warm only the endpoints where it pays for itself in conversion or retention.

Isometric illustration of microservices with CPU dial turned down
Written by Viktoria Sulzhyk · BYBOWU
3,174 views

Get in Touch

Ready to start your next project? Let's discuss how we can help bring your vision to life

Email Us

[email protected]

We'll respond within 24 hours

Call Us

+1 (602) 748-9530

Available Mon-Fri, 9AM-6PM

Live Chat

Start a conversation

Get instant answers

Visit Us

Phoenix, AZ / Spain / Ukraine

Digital Innovation Hub

Send us a message

Tell us about your project and we'll get back to you

💻
🎯
🚀
💎
🔥