On December 2, 2025, GitHub made a sweeping change to how GitHub Copilot premium requests are controlled and billed for Enterprise and Team accounts created before August 22, 2025. The automatic account-level $0 budgets many admins relied on were removed and replaced by a premium request paid-usage policy. In plain English: you now decide with a toggle whether to allow paid overages or to block them—no more hidden $0 tripwire silently stopping your developers.
What exactly changed on December 2?
Here’s the thing: before December 2, lots of orgs were protected by a default, account-level $0 budget. Once developers hit their monthly premium request allowance, Copilot’s premium features simply stopped working—no charges. That safety net is gone for affected Enterprise and Team accounts. Now, overage behavior is governed by a single policy (Enabled = bill overages; Disabled = block usage) and, increasingly, by dedicated SKUs for each AI tool (for example, Spark has its own premium request SKU as of November 1, 2025). You can still set budgets and alerts, but the default automatic block is no longer the guardrail you think it is.
Why this matters: if your policy is Enabled and you haven’t set sensible budgets, overages can accumulate—especially with models that have higher multipliers or with features like Spark that consume multiple premium requests per prompt.
How GitHub Copilot premium requests work (and where teams get tripped up)
Every plan includes a monthly allowance of premium requests. Current published figures show: Copilot Free (50/month), Copilot Pro (300), Copilot Pro+ (1500), Copilot Business (300 per user), and Copilot Enterprise (1000 per user). If you enable overages, additional premium requests are billed at $0.04 USD per request. Some features and models count more than 1 due to multipliers, so that $0.04 is just the starting point.
Included models for paid plans currently list GPT‑4.1, GPT‑4o, and GPT‑5 mini as consuming 0 premium requests. Use them and you won’t touch your allowance. Choose other models and multipliers kick in. Examples you’ll see in the docs: Claude Sonnet 4/4.5 (1×), Gemini 2.5 Pro (1×), GPT‑5 (1×), Grok Code Fast 1 (0.25×), Claude Haiku 4.5 (0.33×), and Claude Opus 4.1 (10×). A single Opus 4.1 chat could count as ten premium requests on a paid plan. That stacks quickly.
Do unused requests roll over? No. The meter resets on the first of the month.
What if you run out and overages are disabled? Premium-requested features block, but developers can still use included models (subject to rate limits). If overages are enabled and you have a payment method, premium usage continues and you’re billed per request.
30‑minute checklist to protect budgets without blocking developers
Run this in a single sitting. Timebox: 30 minutes.
- Confirm the policy toggle. In enterprise/org Copilot settings, find “Premium request paid usage.” Set it to Enabled if uninterrupted access is your priority and you’ve set budgets; set to Disabled if you must block charges while you get controls in place.
- Add a Bundled budget and alerts. Create a monthly premium request budget with alerts at 75%, 90%, 100%. If you’re unsure where to start, choose a round number that equals roughly 10–20% of last month’s total usage.
- Set per‑SKU budgets. Spark now has its own SKU; coding agent and others are rolling out. Create individual SKU budgets so one feature can’t starve the rest.
- Turn on “stop usage when budget is reached” where appropriate. Use this on noncritical orgs or cost centers to avoid bill shock. Keep mission‑critical orgs in “allow” mode with a sensible cap.
- Lock model policy defaults. Start with included models (GPT‑4.1, GPT‑4o, GPT‑5 mini). Allow 1× models selectively. Gate 10× models (e.g., Claude Opus 4.1) behind a separate policy or role.
- Enable auto model selection in Copilot Chat (VS Code) for paid users. It applies a small multiplier discount in chat and nudges usage toward efficient options.
- Download last month’s usage report. Identify the top 10 users and top features (SKU) driving spend. Reach out with tips or limits rather than blunt bans.
- Document “Usage billed to.” If a user belongs to multiple orgs/enterprises, they must select the correct billing entity or their requests won’t route as you expect.
People also ask: quick answers for your CFO and VP Eng
How many premium requests do we actually get?
Per the current plans: Free 50/month, Pro 300, Pro+ 1500, Business 300 per user, Enterprise 1000 per user. That’s your monthly allowance before any overages.
How much is a premium request?
$0.04 USD per request beyond your allowance, before multipliers. If a model is 1×, that’s $0.04 per interaction. If it’s 10×, that’s $0.40 per interaction. Spark consumes 4 premium requests per prompt, so a single prompt is effectively $0.16 when you’re in overage.
Do unused requests roll over?
No. Each month starts fresh on the first.
What happens if we hit our budget?
If you’ve set the budget to block at 100%, premium features stop for that entity until the next billing period. If you don’t block, usage continues and is billed.
Why did GitHub remove $0 budgets?
To simplify controls as new AI tools gain their own SKUs. The old approach required admins to create and maintain multiple zero-dollar budgets per tool. The policy toggle and per‑SKU budgets reduce that complexity—if you set them up.
Model multipliers: the silent cost driver
Think of multipliers as a translation layer between power and price. On a paid plan, included models (GPT‑4.1, GPT‑4o, GPT‑5 mini) are 0×. You can chat all day without touching your allowance. Most mainstream coding/chat models sit around 1×. Heavy reasoning models can be 10×. If a senior engineer uses a 10× model for iterative pair‑programming, a handful of back‑and‑forths can burn through 100+ premium requests—and that’s before code review or agents enter the picture.
Practical tip: Default developers to included models in IDE chat. Give staff/principals explicit access to 1× and 10× options for tasks that truly require them (architectural migrations, deep refactoring, incident retros). Announce the policy where developers work—README, Slack, and VS Code workspace settings.
Cost scenarios you can take to finance
Scenario A: 200 developers on Copilot Enterprise (1000 requests/user). You enable overages for flexibility. Let’s say 20% of the team exceeds their allowance by 200 requests. That’s 40 developers × 200 = 8000 premium requests at $0.04 = $320. If half of those were across a 1× model and half across Spark prompts (4× each), your effective overage becomes (4000 × $0.04) + (1000 prompts × $0.16) = $160 + $160 = $320—same dollar total, different usage shape.
Scenario B: A principal engineer conducts six deep design sessions with a 10× model. Six interactions at 10× equals 60 premium requests. If they do that daily for a week and they’re in overage, that’s 300 requests ≈ $12. The takeaway: multipliers matter, but with guardrails the numbers stay sane for high‑value work.
Scenario C: You disable overages org‑wide and forget. Half your developers hit the allowance mid‑month. Premium features silently stop during a critical release. Cycle time slips, and “savings” turn into delay costs. The fix is balanced controls: budgets plus selective blocks.
Spark and the coding agent: set separate guardrails
Spark prompts count as 4 premium requests. With Spark now tracked on its own SKU, give it a budget and alerting curve that matches product needs. For example, a frontend prototyping org could have a modest Spark cap with a hard stop; a solutions engineering org might merit a higher cap with no hard stop but a weekly review.
The Copilot coding agent will likewise show up distinctly in usage and budgets as SKUs roll out. Treat it separately: generous caps where it saves hours of toil (build/test boilerplate, migration scaffolds), tighter caps where it’s “nice to have.”
Reporting: don’t fly blind
Pull usage reports weekly at the enterprise or org level. Look for spikes by user, feature (SKU), and model. Build a simple dashboard: percent of requests on included models, percent on 1× models, and the small tail on heavy multipliers. Your goal is to keep 70–85% of traffic on included models without hurting developer flow.
Remind teams with multiple enterprise or org memberships to set “Usage billed to” correctly. We’ve seen usage vanish into the wrong cost center, causing both chargebacks and confusion about why requests “aren’t working.”
Risks, limits, and gotchas
• Multiple memberships: Developers with seats from multiple orgs must pick the correct billing entity in “Usage billed to,” or premium requests won’t route as intended.
• Mobile subscriptions: Individuals who bought Copilot via mobile stores can’t purchase extra premium requests; upgrading to a paid desktop‑managed plan is the path.
• Rate limits still apply to included models, and response times can vary during heavy usage. Don’t assume unlimited throughput because a model is “free.”
• Budgets block everything across that bundle or SKU once the cap is hit. For mixed‑criticality orgs, split budgets or use cost centers to avoid collateral damage.
• No rollover. If you need burst capacity late in the month, enable overages with a budget rather than pre‑spending early.
A pragmatic rollout plan for engineering leaders
Here’s how I’ve implemented this with teams without drama.
- Set defaults that favor included models. Codify in workspace settings and share a one‑pager explaining when to escalate to 1× or 10× models.
- Create two budget layers. A Bundled budget for Copilot overall and per‑SKU budgets for Spark and the coding agent. Alerts at 75/90/100; only mission‑critical orgs skip hard stops.
- Appoint model stewards. One person per org reviews spikes, approves temporary access to expensive models, and updates the policy monthly.
- Instrument and educate. Ship a weekly Slack digest with top models used, spend, and a tip (e.g., auto model selection in VS Code chat yields a small multiplier discount).
- Revisit quarterly. As models change and SKUs expand, retune policies. What was a 10× indulgence today might be a 1× standard next quarter.
Where this intersects strategy
Zooming out, this change fits a broader pattern: cloud AI tools are moving to granular, metered SKUs. That’s good for transparency, but only if you actively steer usage. The teams that win will build small, boring guardrails into onboarding—model defaults, budgets by SKU, and a report that anyone can read—then unleash engineers to do their best work without second‑guessing every prompt.
If you want a deeper dive into the operational implications, our take in this Dec 2 rulebook for admins complements today’s playbook. If you’re looking for rollout guidance and real‑world patterns from the field, we covered the practical reality for engineering leaders. And if you prefer a short briefing, see what went live on Dec 2. For help implementing controls across multiple orgs, see our AI enablement services.
What to do next (today)
• Check your premium request paid usage policy (Enabled or Disabled) for each enterprise and org.
• Create one Bundled budget and at least one per‑SKU budget (Spark first). Turn on 75/90/100 alerts.
• Default developers to included models; gate 10× models behind a request flow.
• Pull a usage report and identify your top 10 users and features by consumption.
• Communicate your policy in Slack and your repos. Share when to escalate to heavier models.
• Book a 30‑minute weekly review until the trendline stabilizes.
The bottom line
December 2 didn’t make Copilot more expensive by default—it made costs more intentional. With clear policies, per‑SKU budgets, and sensible model defaults, you can keep developers unblocked while protecting the budget. Don’t rely on yesterday’s $0 tripwire; own the controls and you’ll get the velocity gains you bought Copilot for—without the month‑end surprises.
