Amazon Nova Forge is the headline developers wanted this week: a practical path to build a custom model—grounded in your data—without pretending you’re a research lab. Announced during AWS re:Invent (December 1–5, 2025), Nova Forge lets you start from Nova checkpoints, mix in your proprietary corpus early, and ship through Amazon Bedrock with the guardrails you already trust. If you’ve been waiting for a concrete way to move beyond prompt‑engineering and basic fine‑tuning, this is it.

Here’s the thing: success with Amazon Nova Forge isn’t about flipping a switch. It’s about sequencing decisions you control—data, architecture, governance, and cost—so you can deliver a credible pilot in weeks, not quarters. Below, I break down what AWS actually shipped, when Nova Forge is the right tool, and a day‑by‑day plan we’ve used with teams who need results fast.

What actually shipped this week—and why it matters

Three announcements form the spine of a serious Nova roadmap. First, Nova Forge is generally available to help you build your own frontier‑class model by injecting proprietary data at pre‑training, mid‑training, or post‑training phases. That means you can go earlier than typical fine‑tuning and avoid the usual “catastrophic forgetting” tradeoffs by blending with Nova‑curated datasets and reward functions.

Second, Amazon Nova 2 Omni entered preview on December 2, 2025. It’s a multimodal reasoning model that accepts text, images, video, and speech inputs and can output both text and images. It supports a 1M‑token context window, handles 200+ languages for text, and supports speech in multiple languages. In plain terms: you can build workflows that listen, watch, read, and respond—without stitching multiple models.

Third, Amazon Bedrock AgentCore added Policy (preview) and Evaluations (preview) on December 2, 2025, building on its general availability from October. Policy converts natural‑language rules to Cedar and enforces them in real time across tool calls. Evaluations gives you first‑class quality gates so you can move agents from “cool demo” to “safe in prod.” If you plan to ship agents around your custom model, these are the controls you’ll lean on.

Zooming out, AWS also flagged forward‑looking infrastructure moves—like future Trainium generations integrating high‑bandwidth interconnects (including work with NVIDIA NVLink Fusion) and Trainium‑class servers optimized for large‑scale training—plus the Graviton line for general compute. For most teams, that’s a reminder to choose architectures that won’t fight the hardware roadmap.

Whiteboard sketch of Nova Forge training and Bedrock deployment

When should you use Amazon Nova Forge?

Not every AI problem deserves a custom model. Use this quick decision filter to avoid self‑inflicted complexity.

If your use case is narrow and factual

Stick with retrieval‑augmented generation and a steady, cost‑effective base model. Fine‑tune if you need style or format discipline. You probably don’t need early‑stage checkpoint access.

If your use case depends on domain judgment

Think medical coding with nuanced payer rules, claims adjudication, reconciliation logic, or multi‑step agentic workflows that must reason across modalities. That’s where Nova Forge earns its keep: early training lets your domain patterns become native capabilities, not brittle prompt hints.

If your data is rich but noisy

Nova Forge’s ability to bring your proprietary corpus into pre‑ or mid‑training—paired with reinforcement fine‑tuning in your environment—gives you a better shot at signal over noise. It also reduces the “regex‑through‑prompts” smell you sometimes see with late‑stage fine‑tuning.

Your 30‑day Nova Forge plan (from zero to pilot)

This timeline assumes a focused, cross‑functional squad: one lead engineer, one ML engineer, one data steward, one PM/owner, and part‑time security/finance. Adjust for your company’s cadence, but keep the sequencing.

Days 0–2: Access, scoping, and a one‑pager

• Request Nova Forge access in the SageMaker AI console and confirm tagging for the forge-subscription execution role. • Pick one use case that shows up in your OKRs and costs you real money or time today. • Write a brutally clear one‑pager: problem, user, success metric (quality threshold, latency budget, and unit‑cost ceiling), and the “kill switch” if results miss the bar.

Deliverable: a signed one‑pager with the decision to proceed.

Days 3–7: Data audit and first training plan

• Inventory datasets by provenance and license. Label sensitive fields and decide on masking or synthetic augmentation. • Build a 500–2,000 example gold set with tight acceptance criteria; include edge cases the base model fumbles. • Choose where to inject data: pre‑, mid‑, or post‑training. If your knowledge is foundational (terminology, classification, domain primitives), bias earlier.

Deliverable: a data map, risk notes, and a training plan v1 including checkpoints to try.

Days 8–14: Baselines and the cheap path to “good enough”

• Establish baselines using Nova Lite or Micro via on‑demand inference; measure exact‑match accuracy, F1, and task success rate. • If baselines are close, test a Bedrock fine‑tune before you jump to Forge. Keep your CFO happy by starting with the smallest model that meets the bar. • If baselines are weak or brittle, proceed with Nova Forge mid‑training using your curated corpus plus Nova‑curated data. Track catastrophic forgetting tests against general tasks.

Deliverable: a report comparing baseline, fine‑tune, and early Forge results with cost per 1,000 tasks and p95 latency.

Days 15–21: Reinforcement and policy

• Use Nova Forge’s reinforcement fine‑tuning (RFT) loop to reward the behaviors your reviewers care about—especially refusal behavior and chain‑of‑thought structure, if applicable. • Stand up AgentCore Policy (preview) with Cedar rules: who can call what tools, max spend per run, and forbidden destinations. • Integrate AgentCore Evaluations (preview) to catch regressions on your gold set automatically.

Deliverable: a governed model candidate with pass/fail gates and budget controls.

Days 22–30: Wire to production‑minded plumbing

• Deploy via Amazon Bedrock on‑demand endpoints for bursty traffic, or provision throughput if you need low, predictable latency. On‑demand for custom Nova models has been available since mid‑2025—use it to avoid idle capacity. • Add observability: CloudWatch dashboards, structured logs of tool calls, and red‑team traces for postmortems. Pipe to your existing APM (Datadog, Dynatrace) if that’s your norm. • Run a controlled pilot with 50–500 real tasks, measure task success rate and human‑hand‑off percentage. Decide: scale, iterate, or kill.

Deliverable: a production‑minded pilot with clear go/no‑go criteria.

Diagram of Nova Forge training, checkpointing, and Bedrock deployment

Architecture choices you won’t regret later

• Keep data in region. Use VPC endpoints and PrivateLink for Bedrock, SageMaker AI, and storage. That minimizes egress surprises and halves governance debates. • Favor on‑demand inference for pilots and spiky loads. When traffic stabilizes, run a side‑by‑side with provisioned throughput to compare p95 latency and spend. • Separate “agent brain” from tools. Let AgentCore route and police tool calls; keep tools idempotent and observable. You’ll swap models or add agents later without rewriting your business logic.

On training hardware, resist bespoke rigs unless you truly need them. AWS’s Trainium roadmap and high‑bandwidth interconnect work with partners point to bigger, tighter clusters ahead. Design your pipelines to take advantage of those without pinning yourself to one instance type.

Cost and performance guardrails

Costs sneak up through three paths: data prep labor, training runs that try to “learn everything,” and inference bloat. The countermeasures are straightforward.

• Start small and move up only when tests demand it. Nova families are tiered (Micro, Lite, Pro, and new 2‑series variants). Pick the cheapest model that clears your goals. • Token budgets beat guesswork. For inference, pay attention to input and output tokens; for agent workflows (e.g., Nova Act), meter by agent‑hours and enforce ceilings in AgentCore. • Make latency a budget item. Track p95 and p99 alongside cost per task. If you normalize on “dollars per resolved case,” trade up to a bigger model only when ROI is obvious.

Governance that actually ships

AgentCore is the missing middle between “we trust the model” and “security won’t sign off.” Use Policy to write rules in plain English that compile to Cedar and enforce in real time—every tool call, every run. Keep policies in version control with change reviews. Use Evaluations on every candidate build and wire in a red‑team suite that attacks prompts, tools, and outputs.

If you expose endpoints publicly, pair your Bedrock front door with bot defense and abuse safeguards. We’ve outlined a pragmatic approach in our take on Cloudflare AI bot protection; it’s the difference between a stable pilot and a help‑desk flood.

KPIs and tests that prove you’re ready

Ready beats perfect. Prove you’re ready with a small set of stubborn metrics:

• Task success rate on your gold set (target your ops threshold, not a leaderboard). • Regression budget: any improvement must not degrade core safety or reliability beyond a set delta. • Cost per resolved task at p95 latency: this tells finance and engineering the same story. • Human hand‑off rate: show where agents should escalate and how fast. • Drift checks monthly: detect data changes before your users do.

Risks and edge cases (and how to handle them)

• Catastrophic forgetting: blend domain data with general data and run regression tests against generic capabilities. • Hallucinations under tool pressure: enforce AgentCore policies to throttle tool fan‑out and require evidence for claims. • Multimodal surprises: with Nova 2 Omni (preview), be explicit about allowed input types; restrict video and speech if you can’t review them yet. • Privacy: tokenize or mask sensitive fields before training, and segregate eval data from training data to keep your tests honest.

Comparing your options in plain English

• Off‑the‑shelf model + RAG: cheapest, fastest to ship; great for knowledge lookup and templated reasoning. • Bedrock fine‑tune: adds format/style control and mild domain adaptation at modest cost. • Nova Forge: heavier lift and subscription, but best for native domain reasoning—especially for agent workflows that must make multi‑step decisions reliably.

Make the most of what’s new

Two recent milestones are easy wins if you plan for them. First, on‑demand deployment for custom Nova models means your pilot doesn’t need idle capacity sitting around—pay per request until traffic stabilizes. Second, AgentCore’s Policy and Evaluations (both preview) give you a standardized way to prove safety and quality, which saves weeks of internal debate with security and compliance.

What to do next

• Pick one use case and write the one‑pager with success metrics and a kill switch. • Get Nova Forge access and tag your roles correctly. • Build a gold set that includes real edge cases and refusal scenarios. • Run a baseline on Nova Lite/Micro via Bedrock on‑demand before you train. • If you move to Forge, start with mid‑training and iterate with RFT. • Turn on AgentCore Policy/Evaluations early; don’t bolt them on later. • Pilot with real traffic, measure cost per resolved task, and be ruthless with go/no‑go.

Where we can help

If you want a deeper build guide for multimodal reasoning, read our hands‑on take: Amazon Nova 2 Omni: A Builder’s Playbook. Planning to ship agents into production? Our AgentCore adoption guide covers security patterns, observability, and rollout sequencing. And if your architecture spans providers, our AWS Interconnect multicloud 30‑day plan explains how to keep latency and egress in check across clouds.

Need a partner to cut the learning curve? See what we build on our portfolio and reach out via our contact page. We’ll help you scope a pilot that’s realistic on time, cost, and impact.

Team reviewing AI pilot metrics dashboard

Amazon Nova Forge: The Builder’s 30‑Day Plan

What actually shipped this week—and why it matters