In AI, a vector is a list of numbers representing the semantic meaning of data (text, images, audio). A vector database is a specialized storage system optimized for efficiently storing millions of these vectors and performing fast similarity search — finding vectors closest to a query vector using distance metrics like cosine similarity or Euclidean distance.
What makes vector databases different
Traditional databases are built for exact lookups. Vector databases are built for approximate nearest-neighbor (ANN) search — finding the k vectors geometrically closest to a query vector. These are fundamentally different computational problems:
| Property | Traditional DB (SQL) | Vector DB |
|---|---|---|
| Query type | Exact match / range scan | Nearest-neighbor (by similarity) |
| Index structure | B-tree, hash index | HNSW graph, IVF, or product quantization |
| Query language | SQL WHERE clause | ANN search with k and metric |
| Scale | Billions of rows | Millions–billions of vectors |
| Latency | <1ms exact lookup | 1–50ms ANN search at scale |
| Use case | Structured data retrieval | Semantic search, RAG, recommendations |
Exact nearest-neighbor over n vectors in d dimensions costs O(n·d) per query — for 100M vectors of dimension 1536, that's 154 billion operations per query. ANN algorithms like HNSW reduce this to O(d · log n) with >99% recall at a tiny accuracy tradeoff.
HNSW: how fast ANN search works
Hierarchical Navigable Small World (HNSW, Malkov & Yashunin 2020) is the dominant ANN algorithm used by pgvector, Pinecone, Weaviate, Milvus, and Qdrant. It builds a multi-layer proximity graph:
- Layer 0: dense graph — every vector is a node, connected to its M nearest neighbors
- Layer 1+: progressively sparser — each layer has ~1/e fraction of the nodes from the layer below
- Search: start at the top layer (few nodes, long-range jumps), greedily navigate toward query, descend to denser layers, refine at layer 0
HNSW with hnswlib — billion-scale ANN in Python
import hnswlib
import numpy as np
DIM = 1536 # OpenAI text-embedding-3-large dimension
N_VECTORS = 1_000_000
# Build the index
index = hnswlib.Index(space='cosine', dim=DIM)
index.init_index(
max_elements=N_VECTORS,
ef_construction=200, # higher = better recall, slower build
M=16 # edges per node; 16–64 is typical
)
# Add vectors (batch for speed)
vectors = np.random.randn(N_VECTORS, DIM).astype(np.float32)
ids = np.arange(N_VECTORS)
index.add_items(vectors, ids)
# Set query-time exploration parameter
index.set_ef(50) # ef >= k; higher = better recall, slower
# Query: find top-10 nearest neighbors
query = np.random.randn(DIM).astype(np.float32)
labels, distances = index.knn_query(query, k=10)
print(f"Top-10 neighbor IDs: {labels[0]}")
print(f"Cosine distances: {distances[0].round(4)}")
# Save / load index
index.save_index("my_index.bin")
# index.load_index("my_index.bin") # reload without rebuildHNSW vs IVF
IVF (Inverted File Index) partitions vectors into clusters and only searches the closest clusters at query time. IVF uses far less memory than HNSW but has lower recall at the same speed. For production RAG at <50M vectors, HNSW is preferred. For >100M vectors where RAM is the bottleneck, IVF with product quantization (IVF-PQ) is typical.
Distance metrics: cosine vs Euclidean
The choice of distance metric fundamentally affects what 'similar' means. Three metrics dominate in practice:
Cosine similarity: angle between vectors, independent of magnitude. Range: [−1, 1]. 1 = identical direction, 0 = orthogonal, −1 = opposite. Best for text embeddings where length normalization is desired.
Euclidean (L2) distance: straight-line distance in embedding space. Sensitive to vector magnitude. Best when magnitude carries information (e.g., image descriptors).
Dot product: equivalent to cosine similarity when vectors are L2-normalized. Maximum Inner Product Search (MIPS) is used in recommendation systems where magnitude encodes popularity or confidence.
Computing all three distance metrics
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def euclidean_distance(a, b):
return np.linalg.norm(a - b)
def dot_product(a, b):
return np.dot(a, b)
# Example with text-like embeddings (normalized)
a = np.array([0.8, 0.2, 0.5, -0.1])
b = np.array([0.7, 0.3, 0.4, 0.1]) # similar to a
c = np.array([-0.5, 0.9, -0.3, 0.8]) # different from a
print(f"Cosine(a,b) = {cosine_similarity(a,b):.4f}") # high (similar)
print(f"Cosine(a,c) = {cosine_similarity(a,c):.4f}") # low (different)
print(f"L2(a,b) = {euclidean_distance(a,b):.4f}") # small (similar)
print(f"L2(a,c) = {euclidean_distance(a,c):.4f}") # large (different)
# For normalized vectors: cosine_similarity = dot_product
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
print(f"Dot(norm_a, norm_b) = {dot_product(a_norm, b_norm):.4f}")
print(f"Cosine(a, b) = {cosine_similarity(a, b):.4f}")Vector database options in 2025
| Database | Type | Best for | Notes |
|---|---|---|---|
| pgvector | PostgreSQL extension | Existing Postgres stacks, <10M vectors | No separate infrastructure; HNSW + IVF; SQL-native |
| Pinecone | Managed cloud | Production RAG, fastest setup | Fully managed; high cost at scale; serverless tier available |
| Qdrant | Open-source / cloud | High-performance self-hosted | Rust core; fastest open-source; sparse+dense hybrid search |
| Weaviate | Open-source / cloud | Multimodal, knowledge graphs | Modules for CLIP, text2vec; GraphQL interface |
| Milvus | Open-source / cloud | Billions of vectors, enterprise | Distributed architecture; GPU-accelerated indexing |
| ChromaDB | Open-source (Python) | Prototyping, development | Zero setup; in-memory or persistent; not production-scale |
Choosing a vector database
For most RAG applications with <5M chunks: pgvector in your existing Postgres. For production with 5M–100M vectors: Qdrant (self-hosted) or Pinecone (managed). For >100M vectors or multi-modal: Milvus. Don't prematurely migrate — pgvector with HNSW handles millions of vectors with sub-10ms latency on modest hardware.
Metadata filtering with vectors
Production RAG systems rarely do pure vector search — they combine semantic similarity with metadata filters to scope retrieval to the right documents, users, or date ranges:
pgvector: combining vector similarity with metadata filters
-- Create a table with embeddings + metadata
CREATE TABLE document_chunks (
id BIGSERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
doc_id INTEGER NOT NULL,
page_num INTEGER,
chunk_text TEXT,
embedding VECTOR(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index on embedding column
CREATE INDEX ON document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Also index metadata for fast pre-filtering
CREATE INDEX ON document_chunks (user_id, doc_id);
-- ─── Filtered vector search ─────────────────────────────────────
-- Find top-5 chunks most similar to query_embedding,
-- but ONLY from doc_id=42 belonging to user_id=123, pages 30-60
SELECT
id,
chunk_text,
page_num,
1 - (embedding <=> '[0.1, 0.2, ..., 0.9]'::vector) AS similarity
FROM document_chunks
WHERE user_id = 123
AND doc_id = 42
AND page_num BETWEEN 30 AND 60
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'::vector -- cosine distance
LIMIT 5;Pre-filtering vs post-filtering
Pre-filtering (apply metadata filter before ANN search) can miss good results if too few vectors remain after filtering. Post-filtering (ANN search first, then filter) may return fewer than k results if many top-k neighbors don't match filters. Best practice: use pre-filtering with a large k (e.g., k=100), then re-rank and truncate to final k=5. pgvector's HNSW supports pre-filtered search efficiently via standard SQL WHERE clauses.
Practice questions
- What is the geometric interpretation of dot product and why does it measure similarity? (Answer: A·B = ||A|| ||B|| cos(θ). If both vectors are unit length: A·B = cos(θ) — the cosine of the angle between them. When θ=0 (same direction): cos(0)=1, maximum similarity. When θ=90° (orthogonal): cos(90°)=0, no similarity. When θ=180° (opposite): cos(180°)=-1, maximum dissimilarity. For embeddings: similar concepts have small angular separation → high cosine similarity. This geometric interpretation is why cosine similarity is the standard metric for embedding comparison.)
- What is the difference between L1 norm, L2 norm, and L∞ norm of a vector? (Answer: L1 norm (Manhattan): ||v||₁ = Σ|vᵢ| — sum of absolute values. L2 norm (Euclidean): ||v||₂ = √(Σvᵢ²) — geometric length. L∞ norm (Chebyshev): ||v||∞ = max|vᵢ| — largest absolute value. Applications: L2 in distance and similarity calculations; L1 in sparse representations and LASSO; L∞ for adversarial perturbation bounds (maximum pixel change). Gradient clipping uses L2 norm; L1 norm regularisation promotes sparsity.)
- Why is it impossible to visualise embeddings directly, and what techniques help? (Answer: Embeddings are typically 768–3072 dimensional. Humans can perceive at most 3 dimensions. Dimensionality reduction techniques: PCA projects to top-2 principal components (preserves global variance structure). t-SNE (perplexity-based) preserves local neighbourhood structure — similar embeddings cluster visually. UMAP (topological) preserves both local and global structure better than t-SNE. All lose information. Use t-SNE/UMAP for exploring cluster structure; use PCA for understanding variance explained by dimensions.)
- What is a sparse vector vs a dense vector and when is each appropriate? (Answer: Sparse vector: most entries are zero — efficiently stored as (index, value) pairs. BM25/TF-IDF produces sparse vectors: only vocabulary words present in a document have non-zero values (vocabulary size: 50,000+, document has ~100 unique words → sparse). Dense vector: all entries are non-zero — neural embedding outputs. 768 floats all meaningful. Use sparse for: exact keyword matching, interpretable features, memory-constrained environments. Use dense for: semantic search, capturing contextual meaning, transfer learning.)
- What is vector arithmetic and what does it reveal about embedding spaces? (Answer: King - Man + Woman ≈ Queen. Embeddings encode relational structure as vector offsets. The ' - Man + Woman' operation is a vector translation in the gender direction. Similarly: Paris - France + Italy ≈ Rome (capital cities relationship). This arithmetic works because training on text that discusses relationships (kings and queens, capitals and countries) creates consistent directional encodings. Applications: analogy solving, bias detection (measuring direction of gender/race bias), concept interpolation, and zero-shot classification via concept vectors.)
On LumiChats
LumiChats uses pgvector (PostgreSQL vector extension via Supabase) for all vector operations. Document chunks and memory embeddings are stored and retrieved using cosine similarity. Metadata filters on user_id and document_id ensure each user only retrieves their own content.
Try it free