Everyone Talks About LLMs. Almost No One Talks About Where Knowledge Really Lives

This distinction seems simple, but it carries enormous architectural consequences: the quality of a production LLM's responses is limited, above all, by the quality of the retrieval layer that feeds it. You can switch models, adjust temperature, refine prompts, and still get vague or incorrect answers if your retrieval system is poorly designed.

That's where vector databases come in.

What is a Vector Database and Why Does it Matter

Any content, whether text, image, audio, or structured document, can be converted into an embedding: a high-dimensional numerical vector that represents the semantic meaning of that content in a mathematical space.

The central idea is simple: semantically close contents are geometrically close in this space. "Service provision contract" and "supply agreement" will be close to each other, even if they don't share any words.

Vector databases are specialized storage and indexing systems designed to operate efficiently on these vectors. They solve a problem that traditional relational databases were not designed to solve: high-dimensional similarity search, at scale, in real-time.

Every AI application that performs semantic search, RAG (Retrieval-Augmented Generation), recommendations, similarity matching, or long-term memory relies on a vector database under the hood.

The Current Landscape: 14 Options, Very Different Trade-Offs

The ecosystem has grown rapidly and now offers over a dozen production-level options. The wrong choice won't destroy your project, but it will create operational friction, performance bottlenecks, and technical debt that's hard to reverse.

The categories below organize the main tools by usage profile.

Cloud-Native Managed Services

For teams that want zero operational overhead and need to scale with predictability.

Pinecone Fully managed serverless indexing. Zero operations, fast and predictable scaling. The default choice for those who want to start quickly in production without managing infrastructure. Natively integrates with LangChain and major orchestration frameworks.

Weaviate Integrated vectorization with support for hybrid search (vector + BM25). Modular architecture and native GraphQL. Good choice when you need flexibility in schema definition and want to combine semantic search with structured filters. Available in managed cloud and Docker.

Qdrant Written in Rust. Server-side payload filtering, low latency, and notable memory efficiency. One of the most performant options for workloads where P99 latency matters. API via gRPC and REST. Available in managed cloud and self-hosted.

Milvus Designed to operate on a billion-vector scale. High throughput, distributed architecture, support for multiple index types (IVF, HNSW, DiskANN). The choice when you're building large-scale retrieval and need granular control over index topology. Managed interface via Zilliz Cloud.

Embedded in Existing Stacks

For teams that already have consolidated infrastructure and want to minimize the number of systems to operate.

pgvector PostgreSQL extension that adds vector types and similarity operators directly to the relational database. ACID compliant, integrates with the entire existing Postgres toolchain. The natural choice when the team already operates Postgres and the volume of vectors is in the millions. Supabase offers pgvector as a managed service.

Redis Vector In-memory vector indexing over Redis Stack. Ultra-low latency for real-time use cases: live recommendations, feed personalization, semantic caching. The choice when you already operate Redis and need retrieval with sub-millisecond latency.

MongoDB Atlas Vector Search Native vector search within Atlas. Unifies document data and embeddings in the same store, eliminating the need for synchronization between systems. Good choice for applications that already use MongoDB as the primary database and want to add semantic capabilities without introducing new infrastructure.

Azure AI Search Managed vector search and semantic ranking by Microsoft. Native integration with the Azure ecosystem (OpenAI, Cognitive Services, Data Factory). The choice for enterprise environments that already operate within Azure and need a governed solution with SLA and corporate support.

Hybrid Search: Keyword and Vector

For cases where combining lexical and semantic search is necessary.

OpenSearch (kNN) Open-source fork of Elasticsearch, maintained by AWS. Combines vector search with lexical search (BM25) in a single query. Configurable hybrid scoring. Good choice for teams that already operate OpenSearch or need a self-hosted solution with cloud support.

Elasticsearch (kNN) Integrated dense vector search in the mature Elastic ecosystem. Allows combining kNN with full-text queries in a single request. For teams that already run Elasticsearch, it's the path of least resistance to add semantic capabilities without migrating stacks.

Specialized and Embedded

For prototyping, specific workloads, or cases where operational simplicity is a priority.

Chroma Embedded store, lightweight, and without external dependencies. Zero configuration for local development. The default choice for RAG prototypes and applications in the exploration phase. Native integration with LangChain. Not the choice for large-scale production.

FAISS (Meta) Low-level ANN (Approximate Nearest Neighbor) indexing library developed by Meta. GPU-accelerated, highly configurable, support for multiple indexing algorithms. Not a database: it's a library you embed in your process. The choice when you need total control over the index and performance is the dominant criterion.

LanceDB Embedded column store in the Lance format (based on Apache Arrow). Serverless, disk-efficient, designed for multimodal workloads (text, image, video, audio). Good choice for applications that need to persist embeddings of multiple modalities without operating a separate service.

Marqo End-to-end vector search engine: you send documents, it generates the embeddings and indexes. Eliminates the preprocessing step. Good choice for teams that want a simple API without worrying about the choice of embedding model.

Vespa Real-time serving and ranking platform, built for scale. Combines vector search with rule-based ranking, ML models, and custom business logic. The choice for large-scale recommendation and search systems where ranking is as important as retrieval.

Vald Cloud-native distributed ANN, native to Kubernetes. Auto-indexing, fault-tolerant by design, deployment via Helm. The choice for teams that operate Kubernetes and need a vector store that behaves like a first-class citizen in the cluster.

How to Choose: A Decision Framework

The choice of vector database should be guided by four axes:

1. Volume and Growth Rate Up to ~10 million vectors, pgvector or Chroma resolve with operational simplicity. Between 10M and 500M, Qdrant, Weaviate, or Pinecone are solid choices. Above that, Milvus or Vespa are architecturally correct.

2. Required Latency For real-time retrieval with sub-millisecond requirements, Redis Vector. For batch or asynchronous workloads, store latency is rarely the bottleneck. For tight P99 in production, measure Qdrant and Pinecone on your specific workload before deciding.

3. Existing Stack The simplest rule: if you already operate Postgres, start with pgvector. If you already operate Redis, start with Redis Vector. If you already operate MongoDB, start with Atlas Vector Search. Reducing operational surface has real value that performance benchmarks don't capture.

4. Type of Search Needed Purely semantic search: any option resolves. Hybrid search (semantic + keyword): Weaviate, Elasticsearch, OpenSearch. Search with heavy metadata filtering: Qdrant has architectural advantage. Complex ranking with multiple signals: Vespa.

Quick Decision Guide

| Scenario | Recommendation | |---|---| | RAG prototype, initial exploration | Chroma, pgvector | | Production SaaS application | Pinecone, Weaviate, Qdrant | | Billion-vector scale | Milvus, Vespa | | Existing PostgreSQL stack | pgvector | | Existing Redis stack | Redis Vector | | Existing MongoDB stack | Atlas Vector Search | | Multimodal workloads | LanceDB | | Kubernetes-native distributed | Vald | | Enterprise Azure environment | Azure AI Search | | Hybrid lexical-semantic search | Weaviate, Elasticsearch (kNN) |

What Most Teams Get Wrong

Optimizing the vector store before optimizing chunking. The strategy for dividing documents into chunks and the choice of embedding model have much greater impact on retrieval quality than the choice of database. A well-divided corpus in Chroma will outperform a poorly divided one in Pinecone.

Not measuring recall before going into production. Most RAG implementations go into production without any metric of retrieval quality. Measure recall@k on your query set before optimizing anything else.

Ignoring indexing latency. For use cases with frequent data updates, indexing speed is as important as search speed. Not all vector stores are equal in this aspect.

Underestimating the operational cost of self-hosting. Self-hosted Milvus and Weaviate offer total control but require Kubernetes expertise, monitoring, index tuning, and backup management. Honestly estimate this cost before dismissing managed options.

Conclusion

The retrieval layer determines the quality ceiling of any AI application based on retrieval. You can use the best model in the world, but if the context provided to it is irrelevant or incomplete, the response will be proportionally worse.

The good news: the vector store ecosystem has never been more mature. There are solid options for every point on the spectrum of operational complexity, volume, and latency requirements. The wrong choice is rarely catastrophic, but the right choice, made early with the right criteria, eliminates a whole class of problems you won't want to debug in production.

The future of AI isn't just better models. It's better retrievals.

Which vector database are you using in production today?