Amazon Bedrock AgentCore is now the most opinionated, enterprise-grade way to run AI agents on AWS—and as of December 2, it added Policy and Evaluations to turn lab experiments into production software. General availability landed on October 13, 2025, with long-lived runtimes, VPC/PrivateLink support, IAM-aware tool access, and OTEL-friendly observability. If you’ve been waiting for the moment to standardize agent operations on AWS, this is that moment. (aws.amazon.com)
Here’s the thing: AgentCore isn’t just for chatbots. It’s a platform layer that lets agents plan, act, and learn across your systems—securely—while you keep vendor choice for models and frameworks. With fresh updates announced during AWS re:Invent week, plus Amazon’s broader push on Nova models and frontier agents, the pieces finally click for teams that need both velocity and governance. (aboutamazon.com)
What changed this week in AgentCore
On December 2, 2025, Amazon added two big knobs: Policy (preview) and Evaluations (preview). Policy lets you write natural-language rules that compile to Cedar and intercept every tool call at the Gateway—so your agent stays inside guardrails without bespoke middleware. Evaluations gives you 13 built-in quality checks (helpfulness, tool choice, accuracy, etc.) and dashboards in CloudWatch to catch regressions before they hit customers. Evaluations is in preview in four regions; Policy is in preview wherever AgentCore is available. (aws.amazon.com)
With GA on October 13, 2025, AgentCore shipped core building blocks: Runtime (with eight-hour execution windows and session isolation), Gateway (MCP server support plus IAM authorization), Identity (acting on behalf of users with secure vault storage for refresh tokens), Memory (a self-managed strategy and now episodic memory), and Observability (OTEL-compatible; works with CloudWatch, Dynatrace, Datadog, LangSmith, Langfuse, and more). It’s available in nine regions across the U.S., Europe, and APAC. (aws.amazon.com)
Why AgentCore matters (and how it’s different)
Most teams have hacked together LangGraph or CrewAI flows, tied them to a handful of tools, and then hit a wall: brittle auth, short runtimes, no standard for tool discovery, no consistent policy story, and no clean path to prod observability. AgentCore addresses those gaps with a first-class Gateway (MCP-compatible), long-lived Runtime, identity-aware actions, and policy enforcement at the moment of tool invocation. You keep model portability—use Amazon Nova, Anthropic Claude, or even external models via MCP—while centralizing the control plane. (aws.amazon.com)
Amazon’s broader context matters too. At re:Invent 2025, AWS highlighted frontier agents designed to work autonomously for hours or days, alongside new chips and infrastructure for scaling inference and training. That signals an operational vision: agents won’t be ephemeral; they’ll be long-running services that need budgets, SLOs, policies, and audits—the stuff your platform team already manages. (aboutamazon.com)
Amazon Bedrock AgentCore architecture: a blueprint that ships
Start with a dedicated VPC for AgentCore endpoints. Place AgentCore Gateway behind PrivateLink and wire it to your internal MCP servers that expose tools like “CreateTicket,” “QueryInventory,” or “RefundOrder.” Use AgentCore Identity to broker OAuth/IAM on behalf of the caller. For long-running tasks (ETL orchestration, RMA workflows, billing adjustments), lean on the eight-hour Runtime and event-driven restarts if you need multi-day plans. Pipe logs, traces, and evaluation metrics into CloudWatch, then mirror to your observability of choice. (aws.amazon.com)
Third-party integrations are arriving fast. Elastic announced observability for AgentCore agents directly in Elasticsearch, while Informatica shipped MCP servers so agents can operate on governed data with lineage and quality checks intact—useful for regulated workloads. These aren’t toy demos; they’re the connective tissue enterprises need to make agents reliable. (ir.elastic.co)
Is Amazon Bedrock AgentCore tied to Amazon Nova?
No. AgentCore is model-agnostic, but it pairs well with the Nova 2 family. Nova 2 Omni entered preview on December 2 and supports multimodal reasoning (text, image, video, and speech), 1M-token context, and both text and image generation—handy when your agent must parse meetings, update docs with visuals, or verify screenshots. You can also bring Anthropic Claude or other models via Bedrock and MCP; AgentCore’s value is the operational fabric, not model lock-in. (aws.amazon.com)
How does AgentCore compare to MCP alone or other agent SDKs?
Think of MCP as how tools are described and discovered. AgentCore Gateway can talk MCP, but adds IAM-aware auth, policy interception, and centralized routing. Pair that with long-lived Runtime, Memory, Identity, and Evaluations, and you’ve got a full stack for production—not just local dev flows. You can still use LangGraph, CrewAI, or LlamaIndex; AgentCore doesn’t fight that. It standardizes the rails underneath. (aws.amazon.com)
14-day adoption plan for Amazon Bedrock AgentCore
Assume one staff engineer, one platform engineer, and a security partner for reviews. Target a narrow, high-friction workflow (e.g., a support refund that touches CRM, order DB, and billing API).
Days 1–2: Foundations
- Create a dedicated VPC and set up PrivateLink endpoints for AgentCore services. Configure CloudFormation stacks and tagging for traceability. (aws.amazon.com)
- Stand up AgentCore Gateway and connect one MCP server that wraps a single internal API (read-only first). Use IAM authorization as your default stance. (aws.amazon.com)
Days 3–4: Identity and auth
- Configure AgentCore Identity to act on behalf of a pilot group. Store refresh tokens in the built-in secure vault and define least-privilege scopes with IAM. (aws.amazon.com)
- Establish a shared secret rotation schedule and test impersonation flows in staging.
Days 5–6: Runtime and memory
- Implement the agent’s core plan-act loop in your preferred SDK (LangGraph or CrewAI). Target an execution window under 45 minutes. Enable episodic memory for user-specific context retention. (aws.amazon.com)
- Wire a self-managed memory strategy for auditability. Keep PII encrypted at rest, and document retention policy.
Days 7–8: Policy guardrails
- Define Policy (preview) in natural language for hard boundaries: allowed tools, monetary limits, data scope, and escalation conditions (e.g., “any refund over $200 requires human sign-off”). Let AgentCore convert to Cedar, then verify behavior by simulating tool calls. (aws.amazon.com)
- Add rate limits and “break glass” paths for operators in your policy set.
Days 9–10: Evaluations and observability
- Turn on Evaluations (preview) with built-in checks and add a few custom evaluators for domain-specific accuracy. Publish dashboards to CloudWatch and subscribe alerts to PagerDuty/Slack. (aws.amazon.com)
- If you already run Elasticsearch or Datadog, mirror traces and logs to keep your SRE playbooks consistent. (ir.elastic.co)
Days 11–12: Write access and escalation
- Promote one write-capable tool behind Gateway (e.g., “CreateRefund”). Require policy approval for amounts above your threshold and log every action with a human-readable audit trail. (aws.amazon.com)
- Add a fallback: if the agent fails policy checks or eval scores drop below a threshold, route to a human queue with context attached.
Days 13–14: Dry runs and limited beta
- Run 50–100 replayed cases through the agent, measuring time-to-resolution, error rate, and human touch. Ship to a small group in production with feature flags and budget caps.
Choosing models: Nova 2 Omni or not?
Nova 2 Omni is compelling for agents that ingest multimodal inputs (tickets with screenshots, phone calls, product photos) and produce text plus images for user-facing summaries. It supports 1M-token context and 200+ languages for text, with 10 languages for speech, which helps for global operations and long conversations. If your use case is mostly text analysis or retrieval-augmented generation, Claude or other Bedrock models may be more cost-effective; keep Nova 2 Omni for the workflows where images and speech materially improve success rates. (aws.amazon.com)
For deeper customization, pair AgentCore with Nova Forge to create domain-specific models—useful when you need consistent reasoning over your jargon and data. We covered concrete steps for leaders evaluating Nova Forge here: what CTOs should do this week with Nova Forge. That strategy complements AgentCore’s policy and evaluation layers.
Infrastructure realities: performance, chips, and AI Factories
Under the hood, AWS is pushing two levers: its Trainium line and NVIDIA-powered UltraServers. AWS said Trainium3 servers deliver roughly 4× the performance of prior infrastructure with about 40% less power, while it also partners with NVIDIA on AI Factory blueprints and next-gen GPU UltraServers (including GB300 NVL72-class systems). For future-proofing, note AWS plans to adopt NVLink Fusion in Trainium4, positioning for larger-scale model training and faster inter-chip communication. For most enterprises, that translates to better price/perf and shorter procurement cycles as managed services adopt the new instances. (reuters.com)
If you’re hybrid or multicloud, review your interconnect and routing story before you scale agent workloads. We’ve laid out practical guidance in our piece on AWS Interconnect, and if your stack spans Google Cloud, see our take on multicloud interconnect with Google. Getting this right avoids surprise egress charges and latency cliffs as agents start chaining actions across systems.
Security and compliance: practical guardrails
Use Policy to encode rules for money movement, PII access, and write actions. Because policies compile to Cedar, your security team can review the source of truth and audit it alongside IAM. Enforce scoped credentials via Identity so agents act on behalf of a real user role, not a god-token. For data governance, route agent queries through governed sources; Informatica’s MCP servers help when you need lineage and quality enforcement in-line with actions. (aws.amazon.com)
Observability isn’t optional. Centralize traces, tool-call logs, and eval scores. Feed them to CloudWatch and mirror to your standard platform (Elastic/Datadog). Build an on-call playbook with three checks: policy violation, tool health, and model degradation. That keeps incidents resolvable by the same SREs who run your microservices. (ir.elastic.co)
The build vs. buy question: where AI Factories fit
Amazon’s “AI Factories” concept formalizes something many platform teams already do: bring high-end AI infrastructure into a controlled environment with standardized networking, storage, and governance. With NVIDIA partnerships and AWS-managed stacks, you can scale training and inference without bespoke integration projects. For most enterprises, it’s not either/or with AgentCore—think of AI Factories as the hardware/infra tier and AgentCore as the software/operations tier for agents. (techradar.com)
Practical checklist: readiness in one sprint
Use this to sanity-check your plan before the pilot:
- Scope: One workflow with measurable ROI (refunds, claims, catalog updates).
- Data: RAG corpus curated and tagged; governed connectors if regulated. (informatica.com)
- Auth: Roles mapped; least-privilege IAM; vault-backed refresh tokens. (aws.amazon.com)
- Tools: 1–3 MCP-defined tools behind Gateway; read-only first. (aws.amazon.com)
- Policy: Cedar-backed rules for money/PII; human-in-the-loop thresholds. (aws.amazon.com)
- Runtime: Timebox actions; use retries and idempotency keys.
- Memory: Episodic memory for personalization; retention schedule documented. (aws.amazon.com)
- Observability: CloudWatch dashboards plus your standard APM. (ir.elastic.co)
- Models: Default to Claude/Nova based on modality; Nova 2 Omni when multimodal matters. (aws.amazon.com)
- Exit criteria: Eval scores above threshold and zero policy violations across 100 cases. (aws.amazon.com)
What to do next (developers and leaders)
- Developers: Spin up Gateway with a single MCP server; add a read-only tool; test policy interception against malformed inputs. (aws.amazon.com)
- Platform/SRE: Ship a baseline dashboard with tool-call latency, policy denies, and eval trends. Mirror traces to your existing stack. (ir.elastic.co)
- Security: Review Cedar policies alongside IAM; run tabletop exercises for unsafe actions and token leakage. (aws.amazon.com)
- Product: Pick one outcome metric (time-to-resolution, CSAT uplift, manual hours saved) and commit to it for the pilot.
- Execs: Budget for model experiments plus integration work; consider where Nova Forge fits after you validate the workflow. See our Nova Forge builder’s guide for a reality check on customization tradeoffs.
Related reading from our team
If you’re evaluating multimodal reasoning, start with our Nova 2 Omni builder’s playbook. For org-wide readiness and costs, we also covered how premium-request billing can surprise AI budgets—and what to change in your controls. Finally, if your stack spans Kubernetes, make sure your clusters are ready for agent workloads; our Kubernetes 1.35 upgrade playbook outlines the must-do items before you scale inference on EKS.
FAQ
Will AgentCore lock me into Amazon models?
No. AgentCore supports any framework and model, including external ones via MCP. The value is operations—policy, identity, runtime, and observability—around whichever models you pick. (aws.amazon.com)
How long can agents run?
AgentCore Runtime supports extended execution windows (up to eight hours) with session isolation, and preview features add bidirectional streaming for real-time voice agents. For multi-day plans, design resumable tasks and use events to rehydrate context. (aws.amazon.com)
What about cost control?
Use Policy to hard-cap expensive actions and Evaluations to catch low-quality loops. On the infra side, AWS’s newer chips and GPU UltraServers aim to improve price/perf as managed services adopt them; revisit instance choices quarterly. (aws.amazon.com)
Is Nova 2 Omni production-ready?
Nova 2 Omni is in preview as of December 2, 2025. Treat it like you would any preview: isolate traffic, compare against a stable baseline model, and watch evals closely. (aws.amazon.com)
Zooming out, 2026 will be the year agents go from demos to durable services. If you standardize on Amazon Bedrock AgentCore now—policies, evaluations, and all—you’ll ship faster, sleep better, and keep the audit trail your board and regulators will ask for later. If you want help blueprinting your pilot, our team does this work every week—see our services and contact us.