Amazon S3 Vectors is now generally available, and it’s not just another new logo in the console—it’s a genuine pivot point for how we persist embeddings and run vector search in production. If you’re shipping RAG, agent memory, recommendation systems, or similarity search, Amazon S3 Vectors lets you keep vectors inside S3 with billion‑scale indexes and predictable durability instead of standing up yet another specialized store. (aws.amazon.com)
Here’s the thing: when storage, durability, IAM, lifecycle, and replication are already solved by S3, moving vectors there simplifies both operations and cost models. With GA, AWS published concrete limits and performance targets you can plan around—this isn’t hand‑wavy preview territory anymore. (aws.amazon.com)
What just shipped—and why it matters
As of December 2, 2025, S3 Vectors is GA with up to 2 billion vectors per index, support for 10,000 indexes per bucket, and hot‑path queries that can land around the ~100 ms mark for frequently accessed data, while infrequent queries remain under a second. That’s a 40× jump from the 50 million vectors per index cap during preview. (aws.amazon.com)
AWS also states S3 Vectors can reduce total costs to upload, store, and query vectors by up to 90% versus specialized vector databases. You also inherit S3’s durability model and security posture, with server‑side encryption enabled by default and optional per‑index AWS KMS keys—a big deal for multi‑tenant apps that need strict isolation and auditability. (aws.amazon.com)
Finally, S3 Vectors integrates directly with Bedrock Knowledge Bases and can be paired with OpenSearch for hybrid search strategies. The net: lower platform complexity without giving up RAG features your application depends on. (aws.amazon.com)
Amazon S3 Vectors vs. a vector database: which should you choose?
There’s no silver bullet. Specialized vector databases still shine if you need ultra‑low latency across complex hybrid queries (dense + sparse retrieval), heavy aggregations, or custom HNSW/HNSW‑like tuning knobs at the index level. But if you’d rather bias toward simplicity, durability, and native cloud guardrails, S3 Vectors is now the default choice for many teams.
Choose S3 Vectors when:
- Your embedding workload is large, write‑light to moderate, and read‑heavy with predictable patterns.
- You’re already on S3 and want IAM, encryption, lifecycle, replication, and cost allocation tags to “just work.”
- You plan to use Bedrock Knowledge Bases or a serverless stack and don’t want to run or tune a separate database layer.
Stick with a vector database when:
- You’ve got hard p99 goals under ~50–75 ms across diverse queries, or advanced filtering/scoring that depends on custom index types.
- Your product requires cross‑index joins, sophisticated hybrid scoring, or high write rates that exceed per‑index throughput ceilings.
Plenty of teams will land in the middle: front most queries through S3 Vectors and keep a smaller, performance‑tuned vector DB for latency‑critical experiences. Because S3 Vectors lives in S3, you can design this hybrid on your terms.
Designing with Amazon S3 Vectors: a build blueprint
Let’s get practical. Here’s a step‑by‑step blueprint we’ve used with teams standing up vector search on S3 in weeks, not quarters.
1) Define your retrieval contract
Write down the top three retrieval questions your product must answer, the embeddings you’ll use (model, dimension, update cadence), and your p50/p95 latency budgets per query. This keeps the index layout honest and avoids over‑engineering.
2) Choose dimension size and embedding policy
S3 Vectors supports 1–4,096 dimensions per vector. Don’t just default to 1536 because everyone else does—align dimension size with your model and memory behavior. If you expect to change embedding models, version your vectors in metadata keys and keep a deprecation plan. (docs.aws.amazon.com)
3) Partition with intent: index per domain, not per table
Indexes are your unit of scale and quota. With 2 billion vectors per index and up to 10,000 indexes per bucket, you can partition by customer, data domain, or recency tier (hot/warm/cold). Start wide, not tall—you can always compaction‑merge later. (aws.amazon.com)
4) Model metadata for filtration, not for comfort
Each vector can carry up to 50 metadata keys, with ~2 KB of filterable metadata and 40 KB total. Decide which fields are actually filterable and push the rest to your object store or relational store to avoid query bloat. (docs.aws.amazon.com)
5) Throughput math beats guesswork
Per index, you can make up to 1,000 Put/Delete requests per second with a combined insert+delete rate of up to 2,500 vectors per second. Reads support up to 100 results per query. Use these ceilings to back into how many parallel indexes you need for your ingest SLA. (docs.aws.amazon.com)
6) Hot vs. warm routing
The GA notes indicate hot paths can hit around the 100 ms neighborhood, while infrequent queries remain sub‑second. Keep frequently accessed slices in “hot” indexes and push the long tail to “warm” ones. Your app can route users based on recency or popularity signals. (aws.amazon.com)
7) Security and tenancy by design
Encryption is on by default (SSE‑S3). For regulated tenants, set per‑index KMS keys and enforce attribute‑based access control (ABAC) with tags at the vector bucket and index levels. Log all access and attach your data retention policy to the bucket lifecycle rules. (aws.amazon.com)
8) Retrieval with Bedrock Knowledge Bases
If you’re already on Bedrock, wire S3 Vectors in as the vector store for Knowledge Bases. It trims infra work and makes it easier to swap models later. For hybrid search, keep OpenSearch in the loop for sparse signals and let S3 Vectors carry the dense load. (aws.amazon.com)
9) Cost anatomy: what actually moves your bill
Your bill will be driven by: vector storage volume, per‑request charges, and data movement (if any). The GA release frames potential up to 90% savings compared to specialized vector databases, but your real gains come from shedding cluster management, backups, and separate encryption keys—not just raw storage. Model it with your actual QPS and index sizes. (aws.amazon.com)
10) Migration playbook (from a vector DB)
Start dual‑write for 2–4 weeks, then backfill from your existing store into S3 Vectors per index, verifying cosine/inner‑product parity on a fixed evaluation set. Run shadow reads in production and compare top‑K overlap at p50/p95. Only then cut traffic. Keep the old store warm for 1–2 release cycles, then archive.
How big is big? Capacity and performance math you can trust
Say you’ve got 500 million product vectors at 1536 dimensions with 10% daily churn and want p95 under 300 ms. You could place them in a single index (it fits under the 2B ceiling), but you’ll likely shard into 5–10 indexes by category or recency to keep ingest below the 2,500 vectors/sec per‑index limit and to spread hot traffic. Queries that hit the hot shard can ride closer to the ~100 ms target; cold shards tolerate higher latency without hurting UX. (aws.amazon.com)
For multi‑tenant SaaS, consider index‑per‑tenant when customers demand hard isolation or bring‑your‑own‑key encryption. Otherwise, group smaller tenants by region and compliance regime with ABAC and per‑tenant metadata filters; you’ll get better cache locality and fewer small, underutilized indexes. (aws.amazon.com)
Security, governance, and regional strategy
S3 Vectors inherits S3’s durability and security posture. With GA, you can tag vector buckets and indexes for ABAC and cost allocation, set default KMS keys at the bucket level, and even assign a dedicated customer‑managed key per index. That last bit is the lever many privacy teams need to green‑light consolidation. (aws.amazon.com)
Regional availability expanded from five preview Regions to fourteen at GA; confirm your Region is covered before committing your roadmap, especially for data residency. If you operate in new or edge Regions, plan for replication and routing. (aws.amazon.com)
People also ask
Is Amazon S3 Vectors a replacement for my vector DB?
Sometimes. If you don’t need exotic hybrid ranking, extremely low p99s under ~50–75 ms, or custom index types, S3 Vectors simplifies your stack and can materially lower cost. If your product is a search engine with complex scoring, you’ll probably keep a specialized DB for a subset of traffic.
Do I still need Bedrock Knowledge Bases?
Not strictly, but if you’re already in Bedrock, its Knowledge Bases integrate natively with S3 Vectors and reduce build time for RAG pipelines and agents. It’s a solid default unless you have bespoke retrieval logic. (aws.amazon.com)
What are the hard limits I should design around?
Start with these: up to 2 billion vectors per index, 10,000 indexes per bucket, 1–4,096 dimensions, 50 metadata keys per vector, ~2 KB filterable metadata budget, and up to 100 results per query. For ingest, plan around ~1,000 Put/Delete requests/sec and a combined 2,500 vectors/sec per index. (docs.aws.amazon.com)
Where 50 TB S3 objects fit into the picture
Right alongside S3 Vectors, AWS increased the max S3 object size from 5 TB to 50 TB. For AI teams, that makes it viable to store gigantic raw datasets (think: unsharded video corpora or seismic files) as single objects while maintaining lifecycle policies and replication. If you’re consolidating your AI data plane onto S3—vector indexes, raw corpora, and derived artifacts—this upgrade removes one of the last awkward seams. (aws.amazon.com)
We published a migration playbook for 50 TB objects with hands‑on tips for multipart uploads and lifecycle rules—you can read it here: S3 50TB Is Live: Your Migration Playbook.
The 30‑Day S3 Vectors rollout plan
Use this as a checklist you can paste into your tracker tomorrow.
Week 1: Scope and scaffolding
- Pick one high‑value retrieval use case (support bot, product search, or agent memory).
- Lock embedding model and dimension; design metadata schema for filtering.
- Create a vector bucket and 1–3 vector indexes in your target Region; wire IAM, logging, KMS (per‑index as needed). (aws.amazon.com)
Week 2: Ingest and evaluate
- Ingest a 5–10% sample using the PutVectors API at realistic throughput; record write ceilings vs. your SLAs. (docs.aws.amazon.com)
- Build an evaluation set; run shadow reads and compare top‑K stability across index configurations (hot/warm, partition keys, metadata filters).
Week 3: Wire to your app
- Integrate with Bedrock Knowledge Bases if applicable; otherwise, implement a retrieval facade that routes by recency/popularity. (aws.amazon.com)
- Set p95 alerts and dashboard p50/p95/p99 with sample queries; test failure modes (API timeouts, throttling, index not found).
Week 4: Productionize
- Turn on dual‑write from your current vector store; schedule nightly backfill and consistency checks.
- Run a progressive traffic shift: 10% → 25% → 50% → 100% as p95 stays green; keep rollback ready.
- Close the loop with security: confirm ABAC tags, bucket policies, and per‑index KMS keys align with tenant boundaries. (aws.amazon.com)
Tradeoffs, gotchas, and patterns that actually work
Don’t fight the quotas—embrace horizontal scale. If your write rate bursts above a single index’s limits, shard by a natural key (customer, region, time) and parallelize. Accept that warm shards will be slower; route UX accordingly. (docs.aws.amazon.com)
Keep an eye on metadata bloat. Teams often cram entire documents into vector metadata; that’s a fast path to slower filters and higher bills. Store references (S3 URIs) and a few filterable attributes; retrieve the full payload from S3 only for the top results.
Encryption and tenancy aren’t afterthoughts. If you sell enterprise, assume per‑index KMS keys and tenant‑scoped IAM policies are table stakes. You’ll close deals faster when security reviews see clean boundaries. (aws.amazon.com)
Finally, don’t over‑rotate on millisecond heroics if your product doesn’t need it. Many applications are perfectly fine with sub‑second retrieval—especially when the trade buys you durability, simpler ops, and a smaller monthly bill. The GA performance guidance gives you room to be pragmatic. (aws.amazon.com)
What to do next
For developers:
- Prototype one retrieval path on S3 Vectors this week; measure p50/p95 against today’s stack.
- Decide your partitioning scheme and metadata budget; enforce it in code review.
- Integrate with Bedrock Knowledge Bases for a fast RAG baseline you can iterate. (aws.amazon.com)
For product and engineering leaders:
- Revisit your AI data plane: if vectors, raw corpora, and artifacts all move to S3, your platform roadmap—and cost model—gets simpler.
- Plan a sunset path for any over‑provisioned vector DB clusters; reallocate those dollars to ranking quality and evals.
- Consider the 50 TB object limit in your data ingest plans; it can remove a whole class of bespoke sharding jobs. (aws.amazon.com)
If you’d like a hands‑on plan, our 30‑day Amazon Bedrock AgentCore guide pairs neatly with S3 Vectors for agent memory, and our multicloud interconnect primer covers cross‑region data movement patterns when you’re bridging clouds. See what we’ve delivered for teams like yours in our project portfolio, or talk to us via services to pressure‑test your design before you scale.
