Amazon S3 Vectors is now generally available—and it’s not a tweak, it’s a new storage primitive for AI. The headline: Amazon S3 Vectors supports up to two billion vectors per index with native similarity search, integrates directly with Bedrock Knowledge Bases and Amazon OpenSearch Service, and promises major cost reductions for vector-heavy workloads. For teams building retrieval‑augmented generation (RAG), semantic search, recommendations, or agent memory, Amazon S3 Vectors changes both your architecture and your spreadsheet. (aws.amazon.com)
What exactly shipped—and why developers should care
At GA (announced December 2, 2025), Amazon S3 Vectors adds a new bucket type—vector buckets—with vector indexes inside. You get managed APIs to put, delete, list, and query vectors; server-side encryption with SSE‑S3 or your own KMS keys; tagging for ABAC; and PrivateLink for private network paths. GA expands from five preview Regions to 14 and bumps scale 40×: from tens of millions to two billion vectors per index, with up to 10,000 indexes per bucket. Query latencies for frequent queries have dropped into the ~100 ms range, and AWS quotes up to 90% lower cost than alternative approaches. (aws.amazon.com)
That combination—S3 durability and price profile with native vector search—moves a lot of RAG use cases out of “specialty database first” into an S3‑centric design, especially when your corpus is huge, read patterns are skewed, and only a slice of vectors are “hot” at any time. (aws.amazon.com)
Amazon S3 Vectors vs. purpose‑built vector databases: which one, when?
Here’s the thing: S3 Vectors is not pretending to be a full‑blown vector database with every indexing strategy, algorithm variant, distributed query planner, or exotic filter operation. It’s object storage with native vector search and basic filtering—durable, cheap, massive. That makes it compelling for:
- Billion‑scale archives where 90% of vectors are cold most of the time.
- RAG knowledge bases that need simple, predictable APIs, S3 durability, and tight Bedrock integration.
- Compliance‑heavy environments that prefer S3 encryption, tagging, and lifecycle governance as the control plane.
Traditional vector DBs still shine when you need ultra‑low latency at high QPS, advanced ANN algorithms and query operators, high‑rate upserts, or co‑located scalar features for complex ranking. The obvious pattern is tiering: keep a small hot set in OpenSearch (or another vector engine) and a massive cold set in S3 Vectors, promoting on demand. AWS now supports exactly this hybrid flow by letting OpenSearch offload storage and hydrate from S3 Vectors. (aws.amazon.com)
Key limits, numbers, and behaviors you’ll actually hit
Before you draw boxes in Lucidchart, read the fine print. As of GA, a single vector index supports up to two billion vectors, with dimensions from 1 up to 4,096, and Top‑K up to 100 per query. Per‑index throughput caps include up to 1,000 combined Put/Delete requests per second and up to 2,500 vectors inserted or deleted per second, with request payloads up to 20 MiB. Metadata per vector is capped (e.g., 50 keys, up to 40 KB total), with up to 2 KB filterable. These numbers matter when you batch or stream updates and when you design your filter schema. (docs.aws.amazon.com)
On latency, AWS calls out that infrequent queries return in under a second and frequent ones land around ~100 ms, which is good enough for a lot of agent memory, support search, and background ranking—especially if you cache and warm intelligently. (aws.amazon.com)
Architecture patterns that make sense now
Two patterns we see working immediately:
1) Hot/warm/cold vectors without the pain
Use OpenSearch (or another high‑QPS vector engine) as your hot tier, S3 Vectors as warm/cold. Your ingestion pipeline writes embeddings to S3 Vectors by default; a promotion job mirrors the most‑used slice into OpenSearch based on access logs and business rules. Queries hit a smart router: try hot, fall back to S3 Vectors, optionally re‑rank. This minimizes expensive capacity while keeping recall where you need it. (aws.amazon.com)
2) Bedrock‑first RAG and agents
If you’re already in Amazon Bedrock for models, Knowledge Bases can now point straight at S3 Vector indexes, avoiding a separate vector store to manage. That slashes glue code and governance surface area (keys, VPC endpoints, backups, audits). If you’re evaluating Bedrock Agents or agent frameworks, centralizing long‑term memory in S3 Vectors plus a tiny hot cache is a sane default. (aws.amazon.com)
Zooming out, S3 Vectors lines up with broader AWS moves to make vector and unstructured data a first‑class citizen in S3—alongside last year’s S3 50 TB object bump that simplifies storing huge model shards, high‑res video, and seismic data as single objects. If that change is still on your backlog, here’s our earlier take on how it re‑wires pipeline design: S3 50TB object limit analysis. (aws.amazon.com)
People also ask: Is Amazon S3 Vectors a database?
Short answer: it’s S3 with vector‑native indexing and search, not a general‑purpose vector database. You don’t provision clusters, you don’t micromanage shard rebalancing, and you accept the API surface and limits S3 provides. For massive, cost‑sensitive corpora feeding Bedrock or an OpenSearch tier, that’s a feature, not a bug. (aws.amazon.com)
People also ask: How much will I save?
AWS claims “up to 90%” lower cost to upload, store, and query vectors compared to specialized vector databases. Your mileage depends on QPS, recall, and how much of your data is cold. If your pattern is lots of data, moderate queries, and tight budgets, S3 Vectors will likely pencil out. If you need sub‑50 ms latency on every query and 24/7 heavy writes, keep a dedicated engine in the loop. (aws.amazon.com)
The VECTOR‑FIT checklist for S3 Vectors readiness
Use this to avoid expensive do‑overs:
- Volume: Do you expect >100M vectors in the next year? If yes, S3 Vectors fits the scale profile.
- Events: Are writes bursty but not 24/7 sustained? Fit. If you need continuous high‑rate upserts, test limits first.
- Queries: Is 100 ms–1 s acceptable for most requests? If not, plan a hot tier.
- Tenancy: Will you need per‑tenant KMS keys and ABAC tags for access and chargeback? S3 Vectors supports both.
- Organization: Can you live within 10,000 indexes per bucket and 50 metadata keys per vector? If not, rethink your schema or shard by bucket.
- Recall: Will Top‑K up to 100 meet business outcomes? If you need deep candidate sets, combine with rerankers.
- Footprint: Are you already in Bedrock Knowledge Bases or OpenSearch? If yes, integration savings compound.
If you checked 5+ boxes, pilot S3 Vectors.
A pragmatic migration plan (without breaking prod)
Step 1: Inventory and score your vector use cases
List indexes, dimensions, QPS, write rates, SLA, and compliance. Flag what must remain ultra‑low latency and what can tolerate 100–800 ms.
Step 2: Normalize embeddings and metadata
Dimension caps matter (max 4,096). If you plan multimodal embeddings later, reserve headroom. Flatten your metadata—50 keys max; push non‑filterable fields to object payloads or a side store. (docs.aws.amazon.com)
Step 3: Create your first vector bucket and index
Stand up a vector bucket with KMS (SSE‑KMS) if you need tenant‑level keys. Tag aggressively for ABAC and cost allocation from day one. Enable PrivateLink in regulated environments. (aws.amazon.com)
Step 4: Batch backfill, stream deltas
Use PutVectors in batches up to request limits; monitor the 1,000 requests/sec and 2,500 vectors/sec per‑index ceilings. For hot slices, mirror into OpenSearch and verify recall parity before cutover. (docs.aws.amazon.com)
Step 5: Wire Bedrock Knowledge Bases
Point Bedrock to your S3 Vector index. Validate end‑to‑end latency and result quality with production prompts, not synthetic queries. (aws.amazon.com)
Step 6: Roll out a router
Route queries to hot or warm tiers, cache aggressively, and track tail latency (p95/p99). Lazy‑promote hot IDs nightly based on access patterns.
Step 7: Cost and failure drills
Turn on detailed billing tags, run disaster tests (index deletion protections, restore from source), and rehearse regional failover if your RTO requires it.
Operational gotchas you’ll thank yourself for spotting now
Filters are powerful but capped. Design a compact, stable schema early—don’t ship a proliferation of ad‑hoc fields that burn your 50‑key budget. If you need complex filters or joins with tabular features, keep a companion store (OpenSearch, DynamoDB, or a feature store) and merge results downstream. (docs.aws.amazon.com)
Watch ingest backpressure. If you plan large bursts, shard across multiple indexes and pre‑partition by customer or content type to stay under per‑index limits. Choose batch sizes that balance network efficiency and payload caps. (docs.aws.amazon.com)
Pick Regions deliberately. GA covers 14 Regions; align with your model endpoints, EKS clusters, and data residency rules to avoid cross‑Region chatter. (aws.amazon.com)
Finally, test relevance with your real prompts. Don’t swap vector stores on offline benchmarks alone—measure business KPIs: CSAT, first‑contact resolution, time‑to‑answer, or conversion rate for recommendations.
Data gravity just increased: S3 now stores bigger single objects
Separate but related: AWS increased the maximum S3 object size 10× from 5 TB to 50 TB across all storage classes. For AI teams, that simplifies storing giant training shards, long‑form video, or high‑fidelity maps without splitting files—and it pairs neatly with S3 Vectors when you want vector metadata near the source payloads. If you’re updating pipelines, our guide to the 50 TB shift breaks down multipart upload, transfer tuning, and lifecycle strategies. Explore the 50 TB pipeline plan. (aws.amazon.com)
Security and governance: how this fits your risk model
Many enterprises prefer S3 as the control plane because encryption, tagging, access policies, and audit pathways are familiar. With S3 Vectors you can apply customer‑managed KMS keys per vector bucket or per index, use tags for attribute‑based access control, and keep traffic private via PrivateLink. For highly regulated workloads, that alignment with existing S3 controls can cut weeks from security reviews. (aws.amazon.com)
Let’s get practical: a 30‑day pilot plan
Here’s a minimal investment plan that derisks adoption and gives execs real numbers.
Week 1: Scope and baseline
Pick one RAG use case with >50M vectors potential. Capture current latency, recall@K, infra cost, and operator toil hours. Document embedding dim, metadata shape, and peak write rates.
Week 2: Stand up S3 Vectors + dual‑write
Create a vector bucket and index, enable SSE‑KMS, and dual‑write from your embedding service. Start a small OpenSearch hot tier if you need a cache. Validate Bedrock Knowledge Bases integration if you’re on Bedrock. (aws.amazon.com)
Week 3: Query router, observability, and promotions
Build a router that checks hot cache first, then S3 Vectors. Add dashboards for p50/p95/p99, error codes, throttle rates, and per‑tenant costs. Nightly job promotes hot IDs to the cache tier based on request logs.
Week 4: Bake‑off and decision
Run side‑by‑side against your current vector store for seven days of real traffic. Compare cost and KPI impacts. If results hit targets, plan a staged migration with a kill switch.
But what about your agents and governance?
As teams graduate from simple RAG to task‑performing agents, memory size balloons and guardrails matter. Keeping long‑term memory in S3 Vectors plus a small hot tier makes scale and cost manageable, while policy enforcement stays close to your existing S3 and Bedrock governance. If you’re designing agent policies and evaluation loops, our deep dive on Bedrock AgentCore governance explains how to test and ship safely.
What to do next (developers)
- Prototype with one vector index per major corpus; stay under per‑index throughput limits until you size the right sharding.
- Adopt a compact filter schema; avoid dynamic metadata explosions.
- Measure business metrics, not just recall; wire a quick re‑ranker for quality.
- If you need instant responses, add a hot tier; use S3 Vectors as the system of record.
- Plan capacity tests at realistic burst rates; watch error and throttle codes.
What to do next (engineering leaders)
- Stand up a 30‑day pilot with a clear success bar: target QPS, p95 latency, cost per 1M queries, and operator hours.
- Choose Regions to co‑locate with models and EKS to avoid cross‑Region costs.
- Align security: KMS strategy, ABAC tags, and PrivateLink from day one.
- Revisit your multicloud posture; S3 Vectors strengthens the gravitational pull toward AWS—account for that in your roadmap. Our multicloud playbook covers the trade‑offs.
- Budget for the hot tier, not just S3; latency SLAs often demand it.
Final thought: the center of gravity moves to S3
For a decade, “store in S3, index elsewhere” was the RAG default. With Amazon S3 Vectors, a lot of teams can flip that mental model: store and search in S3 by default, then add a small high‑octane cache where it pays off. It’s simpler governance, fewer moving parts, and more predictable costs at truly massive scale. If you want help pressure‑testing your architecture or running a pilot, we’ve done this before and know where the dragons hide—reach us via cloud and AI services or contact the team.
