BYBOWU > Blog > AI

Amazon Nova Forge: Build vs Buy for AI in 2026

blog hero image
AWS quietly made the most consequential AI move of the quarter: Amazon Nova Forge is now generally available, alongside fresh Nova 2 models and AgentCore upgrades. If you lead engineering or product, this isn’t just another keynote—it's a decision point. Do you keep renting generic models, or start training a model that actually knows your business? Below I break down what changed this week, how Nova Forge really works, when it’s worth it, and a 30–60–90 day plan so you can act befo...
📅
Published
Dec 05, 2025
🏷️
Category
AI
⏱️
Read Time
11 min

Amazon Nova Forge is here, and it’s not a lab curiosity. It’s a production track for companies that want a model that learns from their domain—from underwriting notes to CAD assemblies—without throwing away general reasoning. Paired with the new Nova 2 family and real improvements in Bedrock AgentCore, the question for 2026 isn’t “Should we try AI?” It’s “Where do we draw the line between buying great general models and building the parts that define our moat?”

I’ve shipped agents into regulated environments, migrated teams off brittle fine‑tunes, and watched cost curves eat otherwise brilliant roadmaps. Here’s the thing: the stack just matured. You can now start earlier in the training lifecycle using Nova checkpoints, wire policies that actually stop agents from wandering, and evaluate quality with built‑in signals instead of vibes. Let’s get practical.

Architecture diagram of a Nova Forge pipeline from data to deployment

What changed this week—and why it matters

Between December 2 and December 4, 2025, AWS rolled out three moves with real impact: Nova 2 models (Lite, Pro in preview, Omni in preview, and Sonic for speech‑to‑speech), the general availability of Amazon Nova Forge, and a slate of Bedrock AgentCore additions—bi‑directional streaming for voice agents, Policy (preview) that compiles to Cedar, and Evaluations (preview) with built‑in metrics. Availability notes matter: Nova Forge is live in US East (N. Virginia) today; Nova 2 Pro and Omni are in preview with early access; Sonic is available in multiple regions; and AgentCore capabilities are landing across nine regions with some features in preview.

For leaders budgeting 2026, that combination changes risk. Until now, you either bought a black‑box model or tried to fine‑tune around the edges. With Nova Forge, you can inject proprietary data at pre‑training, mid‑training, or post‑training. With AgentCore Policy and Evaluations, you finally have guardrails and quality signals that don’t require a parallel infra team.

Amazon Nova Forge, explained

So what exactly is Amazon Nova Forge? It’s a managed path to build domain‑specific “frontier‑class” models starting from Nova checkpoints. You work in SageMaker, blend your data with Amazon‑curated corpora, and train across stages: pre‑training (broad capabilities), mid‑training (domain alignment), and post‑training (instruction, RFT, safety). Because it’s built for the Nova family, you keep reasoning strength while specializing on what your customers actually do.

Practically, you get: early access to Nova 2 Pro/Omni, Reinforcement Fine‑Tuning with reward functions you control, and a responsible AI toolkit for guardrails. You ship the resulting model behind Bedrock endpoints or deploy it into a broader agentic system using AgentCore. That last bit matters—operationalizing the model inside policy, identity, and evaluation loops is where enterprises win or bleed.

“What is a Novella—and do we need one?”

In AWS’s parlance, customers can create bespoke variants (“Novellas”) derived from Nova checkpoints. You need one if your errors are consistently domain‑shaped: an insurance bot that confuses per‑occurrence and aggregate limits, a support agent that misreads obscure device log codes, or a developer assistant that must internalize a decade of internal APIs. If you’re solving generic summarization or marketing copy, keep buying. If your competitive edge sits in tacit knowledge, build.

Build vs. buy: a fast decision framework

Use this three‑gate filter to decide where Amazon Nova Forge belongs in your stack:

Gate 1: Differentiation density. How concentrated is your proprietary knowledge in the expected output? If 80% of the task relies on public competence and 20% on your data, buy a general model and add retrieval. If it’s inverted—claims adjudication heuristics, chem design rules, factory troubleshooting—start a Forge pilot.

Gate 2: Error tolerance and observability. Can you tolerate 1–2% domain‑critical mistakes? If no, you’ll need better priors than RAG can offer; Forge plus Evaluations gives you repeatable tests tied to real traces. If yes, keep it simple and avoid ownership costs.

Gate 3: Unit economics. Are you pushing sustained, high‑volume inference where quality shifts change revenue or tickets per hour? If the answer is yes, model ownership—even partial—often pays back via fewer tool calls, shorter chains, and less human escalation.

AgentCore just leveled up—and that’s the enabler

I’ve been bullish on Bedrock AgentCore since preview because it tackles the unglamorous parts: identity, tool wiring, observability, and long‑running executions. The recent updates make it production‑ready for voice and regulated sectors. Bi‑directional streaming means your voice agent can listen and speak at once, handle interruptions, and keep context. Policy (preview) intercepts every tool call, compiled to Cedar, so your “no PII to third‑party APIs” rule is enforceable. Evaluations (preview) gives you built‑in assessors for accuracy, tool selection, and more—plus custom scoring—so you can gate deployments on metrics, not demos.

Pair that with marketplace support for Agent‑to‑Agent (A2A) servers and global region coverage, and you get a sane way to integrate partner agents, internal tools, and your Nova‑based models without bespoke glue. If you want a deep dive on how to wire agents and tools, our primer on AgentCore goes step by step in production terms—VPC endpoints, IAM boundaries, the works. Read: Amazon Bedrock AgentCore: A Practical Adoption Guide.

Can I run this in a VPC and keep data private?

Yes. AgentCore services support VPC connectivity, PrivateLink, and CloudFormation resources. You can keep data paths private, lock down egress, and tag resources for cost allocation. For many enterprises, that security posture is the difference between a showcase and a signed-off go‑live.

AgentCore policy and evaluation dashboard concept

Architecture patterns that ship

Pattern A: Domain model + lightweight RAG. Train a Novella to internalize domain primitives (units, codes, schemas), then use a minimal retrieval layer for fresh facts. This reduces hallucinations and tool thrash because the base model “speaks your language” before it fetches.

Pattern B: Voice agent for operations. Sonic + AgentCore streaming + Policy. Use Nova 2 Sonic for speech‑to‑speech, stream through AgentCore, and intercept tool calls with Policy for guardrails. Add Evaluations to score calls for helpfulness and correctness; route low scores to human follow‑up. This is viable for field support, contact centers, and shop floors.

Pattern C: Multimodal reasoning for workflows. Omni (preview) for video + text, wrapped in AgentCore Identity so the agent acts on behalf of a user. Think QA on assembly video, with the agent opening tickets and attaching annotated frames—no human stitching of model outputs required.

Data you can use today

If you’re planning capacity or sequencing teams, pin these facts to your board: Nova Forge became generally available on December 2, 2025, initially in US East (N. Virginia). Nova 2 Lite is available broadly; Nova 2 Pro and Omni are in preview with early access tied to Forge or AWS approval. Nova 2 Sonic is available today in multiple regions and integrates with common telephony stacks and open‑source real‑time frameworks. Bedrock AgentCore added bi‑directional streaming on December 2, 2025, and introduced Policy and Evaluations in preview on the same date. These are not roadmap promises; they’re shipping features you can test this week.

Costs and governance: how to avoid surprise bills

Budget discipline wins adoption battles. Start with hard envelopes: per‑environment spending limits, per‑agent concurrency caps, and per‑feature kill switches. Separate budgets for training (Forge via SageMaker) and inference (Bedrock endpoints). For voice, enforce maximum turn length and end‑of‑dialog heuristics to cap streaming minutes.

Most overruns happen in two places: unbounded tool loops and careless preview testing. Use AgentCore Policy to cap tool invocations per session and forbid high‑latency tools in live queues. Use Evaluations to create pre‑prod gates: no deployment unless quality clears a target and cost per task stays within range. If you’re wrestling with AI budgeting in general, our practical guides on controlling metered AI usage and avoiding “mystery” invoices will save you hours; start with these budget controls for AI tools and adapt the patterns to Bedrock and SageMaker.

Let’s get practical: the Forge Readiness Checklist

Use this one‑pager with your team before you file the PRD:

Data readiness. You can enumerate the 5–10 data sources that define expertise (wikis, tickets, design docs, logs). You’ve mapped licensing and PHI/PII constraints. You can produce 5–10k high‑signal examples for mid/post‑training.

Golden tasks. You’ve written 20–50 tasks that represent “if the model nails these, we make money.” Each has acceptance criteria, expected tool calls, and a target cost and latency.

Guardrails. You’ve drafted three must‑never rules (e.g., “never email attachments outside our domain,” “only schedule maintenance windows within approved hours,” “no customer data to non‑approved endpoints”) and you’re ready to encode them as Policy.

Observability. Logs and traces are unified in CloudWatch or your provider of choice; you can correlate per‑task cost and quality. Evaluations will record baselines and promotions.

People. An applied ML lead owns training, a platform engineer owns AgentCore and networking, and a product manager owns golden tasks and rollout. No shared ownership blur.

Risks and gotchas (from actual rollouts)

Catastrophic forgetting. When you specialize too aggressively, the model loses general wit. Nova Forge mitigates this by starting earlier in the lifecycle and blending curated data, but you still need a balanced curriculum and staged evaluations.

Data skew. If your “proprietary” data is inconsistent, stale, or heavy on edge cases, your model will learn the wrong lessons. Invest in curation and label hygiene; small, clean sets beat giant messy ones.

Latency stacking. Streaming voice + tool calls + grounding can create hidden delays. Profile the full path and prune tool invocations with Policy. Sometimes a slightly larger model with fewer tool hops is cheaper and faster.

Preview complacency. Pro and Omni are in preview—great for experimentation but not an excuse to skip exit criteria. Treat previews like pre‑GA libraries: pin versions, isolate environments, and plan for change.

FAQ: quick answers leaders ask

Do we still need retrieval if we build on Nova Forge?

Usually yes, but less of it. A domain‑aware model plus a small, high‑quality retrieval layer beats a generic model with a heavy RAG scaffold. Keep retrieval for fresh facts, compliance text, and long‑tail references.

Where should we start: Nova 2 Lite, Pro, Omni, or Sonic?

Pick by task. Lite for everyday reasoning at speed and cost efficiency, Pro (preview) for complex multi‑step work and agents, Omni (preview) for multimodal reasoning and image generation, and Sonic for real‑time voice. If you’re building a Novella, you’ll likely evaluate on Lite for iteration speed, then graduate.

How does AgentCore Policy differ from app‑level “if” statements?

Policies are centralized, compiled to Cedar, and enforced on every tool call. You’re not relying on developers to remember guardrails in each service; you declare rules once and get auditability across agents.

Can we run this across clouds?

Inference endpoints and agents can integrate with external tools over MCP/A2A and standard APIs. If you’re navigating multi‑cloud connectivity and governance, our practical multicloud guide will help you avoid brittle tunnels: a 30‑day plan for real multicloud interconnects.

What to do next (30–60–90 days)

Day 0–30. Stand up a Forge sandbox in us‑east‑1. Define golden tasks, collect 5–10k high‑signal examples, and wire AgentCore with Policy/Evaluations around a thin agent. Ship one internal demo to a tough stakeholder.

Day 31–60. Run ablations: Lite vs your baseline model with and without retrieval; turn Policy rules on and off; measure cost per task and error classes. Start a Sonic pilot if you have voice workflows, with strict streaming budgets.

Day 61–90. Decide: graduate to a Novella and schedule a production rollout behind feature flags, or park the effort if the ROI isn’t there. If you go forward, draft your change plan (SLOs, rollback, human‑in‑the‑loop) and training refresh cadence.

Where we can help

If you want an experienced partner to pressure‑test your approach, we design and ship agentic systems with hard budgets and measurable outcomes. See our recent work in the portfolio, browse focused AI and platform services, or talk to our team about a two‑week discovery sprint. And if you’re managing the AI spend side of the house, our guidance on controlling premium request budgets and avoiding surprise charges pairs well with the governance advice above.

Zooming out, the center of gravity is shifting from “use a smart model via API” to “own the intelligence that makes you competitive.” With Nova Forge and the latest AgentCore capabilities, AWS just made that path accessible to teams that care about both quality and control. Choose wisely where you build, where you buy, and where you do neither.

CTO team reviewing a 30–60–90 AI plan
Written by Viktoria Sulzhyk · BYBOWU
3,549 views

Work with a Phoenix-based web & app team

If this article resonated with your goals, our Phoenix, AZ team can help turn it into a real project for your business.

Explore Phoenix Web & App Services Get a Free Phoenix Web Development Quote

Get in Touch

Ready to start your next project? Let's discuss how we can help bring your vision to life

Email Us

[email protected]

We typically respond within 5 minutes – 4 hours (America/Phoenix time), wherever you are

Call Us

+1 (602) 748-9530

Available Mon–Fri, 9AM–6PM (America/Phoenix)

Live Chat

Start a conversation

Get instant answers

Visit Us

Phoenix, AZ / Spain / Ukraine

Digital Innovation Hub

Send us a message

Tell us about your project and we'll get back to you from Phoenix HQ within a few business hours. You can also ask for a free website/app audit.

💻
🎯
🚀
💎
🔥