AWS Graviton5 just landed, alongside EC2 M9g instances in preview (Dec 4, 2025). The headline: up to 25% better compute performance versus Graviton4, 192 cores per chip, larger L3 cache, and higher network and EBS bandwidth. For teams already on Graviton3/4—or hesitating on Arm64—this is the moment to lock a plan. This guide lays out exactly how to move key services to AWS Graviton5 without drama, with a 90‑day path that product leaders and platform teams can share and execute.
Here’s the thing: announcements don’t cut your bill. Consistent, targeted migrations do. If you pick the right first movers (stateless services, Java and Node.js APIs, Redis/Memcached fleets, event processors), you’ll see fast wins while you de-risk the rest.
What actually shipped—and why it matters
Let’s anchor on the specifics that affect your roadmap.
• EC2 M9g (preview) powered by AWS Graviton5. AWS reports up to 25% higher compute performance versus Graviton4-based M8g. The chip exposes 192 cores and a materially larger L3 cache, translating to steadier per-core latency for CPU-bound services.
• Bandwidth uplifts. Compared to prior generations, networking and EBS throughput are higher—and on the largest sizes, network bandwidth can effectively double. That’s not a vanity metric: it reduces tail latency for chatty microservices and speeds backup/restore windows.
• Security posture. Graviton5 continues always-on memory encryption and pointer authentication, and the Nitro System adds a Nitro Isolation Engine using formal verification techniques to strengthen tenant isolation. If you’re selling into regulated verticals, this is a board-level talking point.
• Compatibility and tooling. The existing Graviton ecosystem (Porting Advisor, Graviton Fast Start, ARM builds in major language toolchains) carries forward. If you’re already publishing multi-arch images, your migration is mostly scheduling and sizing.
The AWS Graviton5 90‑day migration plan
This is the exact cadence I’d run inside a product organization with dozens of services. Adjust for your size, but keep the order: measure, prove, then scale.
Days 0–30: Discovery and baselines
1) Inventory and tag. Export a service list with runtime, CPU hours/month, p95 latency, and current instance families. Tag the “Pioneer 10” candidates: stateless APIs, job runners, web backends, and caches that don’t depend on x86-only binaries (e.g., AVX-only libs).
2) Establish control benchmarks. For each Pioneer service, record a 7‑day baseline: CPU utilization at p90, request/sec, p95/p99 latency, and EBS throughput. Save flamegraphs or async-profiler snapshots for at least your top two Java services to target hot paths.
3) Multi-arch builds. Switch your CI to publish linux/amd64 and linux/arm64 images using Buildx. Pin base images that have official arm64 variants (e.g., alpine, debian, distroless). Avoid surprises by running container image vulnerability scans separately for both architectures.
4) Language specifics. Java: use a current JDK with AArch64 JIT improvements, enable container-aware flags, and test G1 vs ZGC on arm64. Node.js: ensure native modules (bcrypt, sharp, grpc) are either pure JS/WebAssembly or have arm64 prebuilds. Python: verify wheels exist for scientific libs; otherwise plan to compile on arm64 build agents.
5) Cost and capacity hypothesis. Based on the M8g→M9g gains, estimate how many fewer vCPUs you need for the same throughput. Keep the model conservative (assume 15–20% improvement for mixed workloads) and pressure-test it with load tests.
Days 31–60: Port, build, and pilot
6) Stand up a parallel arm64 environment. Mirror production topology for the Pioneer 10 in a separate VPC or namespace. Use the same autoscaling policies and the same Envoy/Nginx sidecars to keep apples-to-apples comparisons.
7) Rebuild performance artifacts. Generate new flamegraphs on arm64, compare code paths, and confirm that JIT warmup is acceptable under your pod rollout strategy. For Go services, rebuild with -cpu=arm64 and re-run microbenchmarks to catch unexpected alignment issues.
8) Data path checks. Test backup/restore timing with EBS on M9g, and measure inter-service RTT from M9g to your data stores. If you rely on S3 heavily, run a parallel multipart throughput test to validate that your backup windows shrink as expected. If your pipelines move multi-terabyte objects, revisit your retention and tiering design in light of larger object options discussed in our analysis of S3 50TB data pipelines.
9) Run a 10–20% canary. Shift a portion of live traffic to arm64 and bake for a full business cycle (at least one weekday peak and one weekend). Watch cold-start behavior for Functions-as-a-Service that call into your services and look for retry storms.
10) Update capacity plans. With live data, refine instance size selection. Graviton generations reward “right-sizing”: fewer, larger instances often beat many small ones because of cache behavior and network headroom.
Days 61–90: Optimize and roll out
11) Tune the memory and GC profile. On Java, evaluate string dedup, region sizing, and pause-time targets on arm64. For Node.js, trim native extensions and move hot paths to WebAssembly where feasible; the arm64 codegen path in V8 is excellent, but native add-ons can drag.
12) Adopt multi-arch by default. Make arm64 the default for new services, with amd64 as a compatibility exception. Capture the policy in a one-pager and PR template so teams don’t regress.
13) Expand to stateful. After two weeks of stable canaries, move Redis/Memcached fleets and then selected DB read replicas. Validate failover timing and crash recovery on arm64 guests. If you’re planning hybrid interconnects, revisit your throughput and routing; our multicloud interconnect playbook outlines patterns that pair well with high‑bandwidth instances.
14) Decommission or repurpose x86. Consolidate amd64 capacity into fewer AZs or reuse it for workloads that truly need x86-only extensions. Update your AMI catalog, golden images, and Packer templates so platform defaults align with Graviton5.
People also ask
Is AWS Graviton5 “production ready” or just a preview?
The M9g instance family is in preview as of December 4, 2025, but the Graviton lineage is battle-tested across thousands of customers. If you need GA-only options, keep critical x86 services on current families and pilot non-critical workloads on Graviton5 while you wait for GA.
Will my existing x86 Docker images run on Graviton?
Not without emulation. Avoid running amd64 images on arm64 nodes via qemu in production—it’s a debugging crutch, not a migration plan. Build multi-arch images and verify native arm64 dependencies. Most official images now publish arm64 variants.
What performance gains should I expect?
Expect up to 25% compute uplift over Graviton4, with larger cache translating to steadier p99s for CPU-bound services. For web apps and data stores, AWS cites up to 30–35% gains over the prior generation in targeted benchmarks. Your mileage varies with I/O and code paths—measure under your traffic mix.
What breaks?
Common snags: AVX-only libraries (some ML preprocessors, media transcoders), pinned amd64 base images in Dockerfiles, amd64-only CI runners, and vendor agent daemons without arm64 builds. Create a compatibility matrix and resolve before cutover.
Performance and cost math you can take to finance
Let’s run an illustrative scenario for a Java API tier processing 80k RPS at 40% average CPU on M8g. Move that same tier to M9g with a conservative 20% compute improvement and higher EBS/network headroom. If you right-size to reduce instance count by ~15% and also gain 10–15% tail-latency improvements, you typically unlock two levers: lower spend (fewer instances) and higher conversion (faster pages). Even without GA pricing, those ratios justify a limited-scope pilot now. Capture before/after spend for the CFO, and include the knock-on savings from smaller backup windows and less capacity headroom.
For batch fleets, the calculus is simpler: higher per-instance throughput shortens job walls, which reduces total instance-hours. If you’re using Spot for background workloads, you also expand your suitable instance pools by including M9g, improving Spot fulfillment rates.
Security, isolation, and compliance
Graviton5 keeps always-on memory encryption and per-vCPU caches, reducing side-channel blast radius. The Nitro System’s new Isolation Engine extends the long-running hardware/firmware separation story with formal verification. That phrase matters in security reviews: you can point auditors to a mechanism designed to prove isolation properties, not just assert them. Add this to your vendor risk questionnaires and security architecture docs.
Operationally, revalidate your agent stack on arm64: EDR, metrics, logs, and tracing. Most agents provide arm64 builds; pin versions and hash them. If a vendor lags, either isolate the service or replace the agent (e.g., native OpenTelemetry collectors for metrics/traces plus a lightweight syslog shipper).
Gotchas and edge cases
• Native extensions: Image processing (Sharp), PDF libs, and some cryptography stacks may require arm64 prebuilds. Build times spike if you compile on every CI run—use a warm arm64 build farm or cross-compile in Docker Buildx with cache mounts.
• JVM tuning: AArch64 has different warmup and inlining behavior. Run longer load tests and verify steady-state. For high-throughput APIs, ZGC on arm64 is often a net win; for low-latency services, G1 with tuned region sizes may still be best.
• Databases: Managed services (Aurora, RDS) abstract the underlying host, but instance families still matter for performance characteristics. Test read-replica lag on arm64 before promoting any writer.
• Toolchains: Older GCC/Clang and musl versions can kneecap performance. Standardize your base images on modern distros with recent compilers and linkers.
• Ecosystem parity: Most observability and security vendors are there, but check your long tail—feature parity on arm64 sometimes lags behind amd64 by a minor version.
The practical checklist
Use this page during standups. If an item isn’t checked, don’t cut over.
• CI publishes multi-arch images for every service (amd64, arm64).
• Base images pinned to arm64-capable tags; native dependencies verified.
• Canary environment mirrors production autoscaling and sidecars.
• 7‑day before/after metrics captured (CPU, latency histograms, EBS, network).
• JVM/Node/Go/Python tuned for arm64; regression tests green under load.
• Observability agents and security controls validated on arm64.
• Capacity plan and rollback procedure documented for each service.
Where Graviton5 shines first
Start with services that are CPU-bound, horizontally scaled, and user-facing:
• Java/Kotlin APIs (Spring, Quarkus, Micronaut)
• Node.js/TypeScript backends (Express, Fastify, NestJS)
• Go microservices with heavy JSON/Protobuf encode/decode
• Redis/Memcached fleets and Kafka consumers
• Build/test runners and CI autoscaling groups
Move ML training/inference and media workloads later if they lean on x86-optimized instructions you can’t replace yet.
How this fits your 2026 platform goals
Zooming out, Graviton5 aligns with two strategic trends: agentic AI services that call your APIs more frequently, and higher baseline traffic from richer client experiences. More efficient compute and fatter pipes mean you can keep latency budgets intact without ballooning cost. If you’re evaluating custom AI model strategies, the savings here free headroom for pilot projects—our take on the build‑vs‑buy decision is in the Nova Forge build vs buy guide.
What to do next
• Pick your Pioneer 10 and open tickets today. Aim for one pilot per team.
• Flip CI to multi-arch, then block merges that introduce amd64-only images.
• Schedule a one-week canary on M9g for at least two services before December ends.
• Capture before/after dashboards and a short memo for finance—wins get budget.
• If you want help scoping or executing a migration, our cloud modernization services and real-world case studies show how we de-risk these moves at pace. You can also browse more engineering playbooks on our blog.
FAQ for execs and stakeholders
Q: When will M9g be generally available?
A: AWS opened preview on December 4, 2025. GA timelines follow based on preview feedback; plan pilots now and production cutovers once GA lands in your regions.
Q: Do we need new AMIs and golden images?
A: Yes—publish arm64 AMIs with your hardened baseline. Keep amd64 images for legacy services during the transition.
Q: How do we quantify risk?
A: Track three leading indicators during canaries: error rates, p99 latency, and CPU steal. Add a kill switch (traffic shift to x86) and a rollback runbook per service.
Q: Are we locked into Arm forever?
A: No. Multi-arch everything. Treat architecture as a deployment variable, not a rewrite. That optionality is good platform hygiene.
If you’ve read this far, you’re ready. Graviton5 gives you real headroom—compute, bandwidth, and security. The organizations that bank the gains first will have more resources to invest in product, AI, and growth. Let’s get practical and make the next 90 days count. If you want a second set of hands, we’re a message away on the contact page.
