At AWS re:Invent in Las Vegas (December 1–5, 2025), Amazon introduced Amazon Nova Forge, a program for enterprises to create custom Nova models with access to pre‑trained checkpoints and curated datasets—then deploy through Bedrock with production governance. Pricing has been widely reported as an annual subscription around the six‑figure mark, which puts Nova Forge squarely in “serious pilot” territory rather than a casual experiment. If your leadership is asking whether this is the moment to build a proprietary model, here’s the sober, on‑the‑ground view you need.
What exactly is Amazon Nova Forge?
Amazon Nova Forge gives organizations a structured path to create domain‑specific Nova models by injecting proprietary data at different stages—pre‑training, mid‑training, or post‑training—against Amazon‑curated corpora. Unlike standard fine‑tuning, Forge aims to move your differentiation “earlier” into the stack so the model internalizes your taxonomy, workflows, and tone rather than merely adapting output style.
It slots into the broader Nova family announced at the same event, including Nova 2 Lite (fast reasoning with a reported million‑token context), Nova 2 Sonic for speech‑to‑speech, and Nova 2 Omni in preview for multimodal reasoning. The key promise: combine Nova’s capabilities with your institutional knowledge to produce a model your competitors can’t trivially replicate with prompts.
Why this matters now (and not six months from now)
Three shifts landed in the same week and they matter beyond the hype:
First, AWS tightened the loop between model customization and deployment. Bedrock gained stronger evaluation and policy controls—see our take on Bedrock AgentCore quality and policy evaluations—so the governance story around custom models is finally workable for enterprises that must show their homework to security and audit.
Second, Nova Act (AWS’s agent for UI workflow automation) moved into the spotlight with a focus on reliability for real browser tasks. That matters if your “agent” needs to actually click things and book things, not just explain how it would. A custom Nova model behind a reliable agent makes bots that do, not just bots that talk.
Third, the data plane is catching up. Amazon S3 Vectors became generally available with scale in the billions of vectors per index and short‑latency queries. We’ve been tracking that evolution closely in our S3 Vectors GA analysis; pairing a custom model with a first‑party vector store simplifies RAG, retrieval‑augmented agents, and eval harnesses at enterprise scale.
Primary keyword question: Is Amazon Nova Forge right for us?
Here’s the truth: most teams don’t need Nova Forge on day one. But some do—especially those with heavy compliance needs, specialized jargon, or workflows where hallucinations are costly. Use this decision tree:
Choose prompt engineering only if your use case is lightweight (FAQ bots, summarization, drafts), your data is public or generic, and speed trumps marginal gains.
Choose supervised fine‑tuning on a managed model if you have labeled task datasets (instructions, conversations, code diffs) and you want better guardrails and tone without altering the model’s core capabilities.
Choose retrieval/RAG + vector search when accuracy depends on fresh, proprietary documents (policies, catalogs, tickets). This keeps the base model stable while your knowledge evolves in storage.
Choose Amazon Nova Forge when your differentiation is structural—terminology, reasoning steps, dialog etiquette, or multi‑turn task flows—and you’re willing to invest in curation, safety reviews, and long‑cycle evaluation to earn that edge. Think underwriting, pharmacovigilance case handling, airline ops recovery, or B2B support with product matrices that change weekly.
How Nova Forge fits with Bedrock AgentCore and Nova Act
If you adopt Forge, you’ll ship faster by treating the model as the engine and AgentCore as the chassis. AgentCore’s policy, tool‑use, and quality evaluation features let you enforce what the car can and can’t do on your roads. Use Nova Act where agents must operate UIs reliably (booking, quoting, QA in browsers). Tie the whole thing back to a retrieval layer using S3 Vectors for semantic context, so your agent reasons with both learned priors and your freshest data.
We’ve already published guidance on both the data plane and policy layer—start with RAG at billion‑vector scale and deploying trusted agents with AgentCore. If you’re navigating multicloud networks for cross‑cloud agents, our AWS Interconnect + Google breakdown covers the network path choices you’ll face.
Data and numbers to anchor your plan
Across re:Invent 2025, AWS emphasized concrete improvements: Nova 2 Lite’s million‑token context, Nova Act’s reliability focus for UI workflows, S3 Vectors scaling to very large per‑index counts with sub‑second queries, and Trainium3 UltraServers designed to cut training time and energy use versus prior gen. For planning, the key is how these reduce time‑to‑usefulness. In practice, we’ve seen three levers matter most:
1) Context length lets you evaluate large, messy inputs (procedures, contracts) without brittle chunking. 2) Agent reliability determines whether a human has to babysit the workflow, killing ROI. 3) Vector store scale + latency decides how quickly your custom model can ground itself in the right facts.
Practical framework: the 7‑C checklist for custom models
Use this before you pitch Nova Forge to the exec team:
1) Clarity: Define one measurable, production‑worthy task (e.g., “reduce Tier‑2 ticket handle time by 30% in 90 days”). If you can’t measure it, don’t customize a model for it.
2) Corpus: Inventory what you’ll teach the model—SOPs, transcripts, tickets, knowledge articles, error codes. Classify sensitivity levels and legal constraints. Identify redlines (never to be used in training) vs. greenlines (safe to use).
3) Curation: Normalize and deduplicate. Remove outdated guidance and route volatile facts to retrieval instead of embedding them into weights.
4) Constraints: Define policy and tool‑use guardrails in AgentCore: allowed actions, rate limits, PII handling, and failure escalations.
5) Circuit‑breakers: Set automatic kill‑switches on quality drop: rollback checkpoints, disable certain skills, fallback to RAG‑only responses.
6) Comparators: Establish baselines (prompt‑only, FT‑only) and gold‑label test sets. If Forge can’t beat your tuned baseline by a pre‑agreed margin, you don’t ship.
7) Costing: Model different traffic and evaluation mixes. Include hidden costs: data engineering, labeling, security review, and agent runtime (e.g., per‑agent‑hour for Nova Act). The subscription fee isn’t the whole story.
30/60/90: a pilot plan you can actually run
Days 0–30: Narrow the scope and stand up the harness
Pick one workflow: say, “generate a draft RCA for incidents with attached CloudWatch metrics and on‑call notes.” Build a tiny eval set: 100 historical cases with accepted RCAs. Define pass/fail thresholds (precision on causes, quality on remediation). Stand up S3 Vectors for retrieval of runbooks and prior RCAs. Configure AgentCore policies to prevent ticket changes—drafts only.
Data work here is 70% of the time: redact PII, version datasets, and tag provenance. If your legal team isn’t in the room yet, you’re already behind.
Days 31–60: Train, compare, and attack the error bars
Run three contenders: baseline prompt‑only, fine‑tuned managed model, and your Nova Forge candidate. Use the same eval set. Track false confidence, unsupported claims, and missing steps. Add active learning: harvest human corrections from SMEs to expand the dataset. Pressure‑test agents in shadow mode—no production writes.
Expect your first pass to underwhelm. The teams that win instrument error analysis early and fix the top three failure modes ruthlessly.
Days 61–90: Guardrails, go/no‑go, and a narrow launch
Wire circuit‑breakers in AgentCore. Run a two‑week A/B: agent‑drafted RCAs vs. human‑only, with reviewers blind to source. If the Forge model clears your quality bar and cuts time‑to‑first‑draft by ≥30%, launch to a subset of services and start measuring defect escape rate and edit distance per draft.
Gotchas we’ve seen in real projects
Checkpoint overreach: Pushing too much volatile knowledge into weights increases drift risk. Move dynamic product facts to RAG and keep the model focused on stable reasoning patterns.
Shadow PII: Transcripts and tickets hide secrets in free‑text fields. Build PII detectives into your data pipeline and keep raw sources on need‑to‑know storage tiers.
Eval blind spots: BLEU‑ish metrics won’t catch “sounds right, wrong step.” Define domain rubrics with SMEs. Reward correct steps in the right order, not pretty prose.
Agent flakiness: UI changes break automations. Nova Act is more reliable than duct‑taped headless browsers, but you still need visual/regression tests for the flows you automate.
Cost illusions: The subscription is predictable; the human loop isn’t. Budget for weekly eval runs, dataset updates, and policy audits.
People also ask
Is Nova Forge just fancy fine‑tuning?
No. Fine‑tuning adapts a model’s output behavior given instructions and examples. Nova Forge aims earlier—at training checkpoints—so the model internalizes your domain concepts. It’s closer to teaching than tutoring.
How much data do we need to justify Amazon Nova Forge?
Think in quality hours, not terabytes. A few thousand well‑curated cases with consistent rubrics can outperform a million noisy records. If you can’t curate, don’t customize; invest in RAG first.
Will Nova Forge lock us into AWS?
You’ll be coupling to Bedrock deployment and AWS governance features. That’s a trade‑off many enterprises accept for speed, security posture, and network proximity to existing datasets. If multicloud is non‑negotiable, design for abstraction at the agent and retrieval layers, and keep export paths for embeddings and eval assets.
Architecture sketch: a sane v1
Start with a managed Nova model for early demos. Layer RAG via S3 Vectors to ground responses in your docs. Add AgentCore for tools, policies, and evals. When your use case outgrows fine‑tuning, graduate the same harness to Nova Forge so your data and evals don’t change—only the model source does. That continuity is how you keep momentum.
Security and compliance realities
Bring Security in on Day 1. Define classification levels, retention, and allowed uses. Require dataset manifests with provenance and consent flags. In AgentCore, codify what’s off‑limits: no PII in prompts, no unredacted logs, no outbound calls to unapproved endpoints. Finally, measure “safety regression”: when you improve accuracy, did you worsen policy compliance?
Cost modeling that won’t bite you later
Break costs into four buckets: subscription/licensing, data engineering and labeling, training/tuning cycles (including any Trainium3 or managed compute your team touches), and runtime (model inference, RAG queries, agent hours). Put a monthly ceiling on eval runs and make them predictable—e.g., a fixed 500‑case panel every Friday—so Finance isn’t surprised by spiky experimentation.
Where this goes next
Expect a faster cadence of Nova and Bedrock updates in early 2026 and deeper overlaps with security tooling. Also expect your stakeholders to ask why you didn’t just “use GPT like everyone else.” Your answer should be measurable: latency wins from in‑region deployment, policy compliance traceability in AgentCore, and a quality lift that your control group can validate. When those are true, custom models earn their keep.
What to do next
• Pick one workflow you can measure in weeks, not quarters.
• Stand up the eval harness and S3 Vectors before touching model training.
• Pilot with a managed Nova model plus fine‑tuning; treat Nova Forge as the graduation path once your data and rubrics are ready.
• Wire AgentCore policies and circuit‑breakers from day zero.
• Socialize the dashboard: show cost per successful task, not tokens per day.
Need a sparring partner to shape the pilot? See our services overview, review relevant project case studies, and browse more deep dives on the Bybowu blog. If your timeline is aggressive, visit contact us and we’ll help you ship a credible week‑one plan.
