What is vector database options in 2025?

Vectors & Vector Databases: Vector database options in 2025. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/vectors

What is practice questions?

Vectors & Vector Databases: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/vectors

Vectors & Vector Databases

In AI, a vector is a list of numbers representing the semantic meaning of data (text, images, audio). A vector database is a specialized storage system optimized for efficiently storing millions of these vectors and performing fast similarity search — finding vectors closest to a query vector using distance metrics like cosine similarity or Euclidean distance.

How AI stores and retrieves meaning at scale.

Category: AI Fundamentals

What makes vector databases different

Traditional databases are built for exact lookups. Vector databases are built for approximate nearest-neighbor (ANN) search — finding the k vectors geometrically closest to a query vector. These are fundamentally different computational problems:

Property	Traditional DB (SQL)	Vector DB
Query type	Exact match / range scan	Nearest-neighbor (by similarity)
Index structure	B-tree, hash index	HNSW graph, IVF, or product quantization
Query language	SQL WHERE clause	ANN search with k and metric
Scale	Billions of rows	Millions–billions of vectors
Latency	<1ms exact lookup	1–50ms ANN search at scale
Use case	Structured data retrieval	Semantic search, RAG, recommendations

Exact nearest-neighbor over n vectors in d dimensions costs O(n·d) per query — for 100M vectors of dimension 1536, that's 154 billion operations per query. ANN algorithms like HNSW reduce this to O(d · log n) with >99% recall at a tiny accuracy tradeoff.

HNSW: how fast ANN search works

Hierarchical Navigable Small World (HNSW, Malkov & Yashunin 2020) is the dominant ANN algorithm used by pgvector, Pinecone, Weaviate, Milvus, and Qdrant. It builds a multi-layer proximity graph:

Layer 0: dense graph — every vector is a node, connected to its M nearest neighbors
Layer 1+: progressively sparser — each layer has ~1/e fraction of the nodes from the layer below
Search: start at the top layer (few nodes, long-range jumps), greedily navigate toward query, descend to denser layers, refine at layer 0

import hnswlib
import numpy as np

DIM = 1536           # OpenAI text-embedding-3-large dimension
N_VECTORS = 1_000_000

# Build the index
index = hnswlib.Index(space='cosine', dim=DIM)
index.init_index(
    max_elements=N_VECTORS,
    ef_construction=200,   # higher = better recall, slower build
    M=16                   # edges per node; 16–64 is typical
)

# Add vectors (batch for speed)
vectors = np.random.randn(N_VECTORS, DIM).astype(np.float32)
ids     = np.arange(N_VECTORS)
index.add_items(vectors, ids)

# Set query-time exploration parameter
index.set_ef(50)   # ef >= k; higher = better recall, slower

# Query: find top-10 nearest neighbors
query = np.random.randn(DIM).astype(np.float32)
labels, distances = index.knn_query(query, k=10)

print(f"Top-10 neighbor IDs: {labels[0]}")
print(f"Cosine distances:    {distances[0].round(4)}")

# Save / load index
index.save_index("my_index.bin")
# index.load_index("my_index.bin")   # reload without rebuild

HNSW vs IVF: IVF (Inverted File Index) partitions vectors into clusters and only searches the closest clusters at query time. IVF uses far less memory than HNSW but has lower recall at the same speed. For production RAG at <50M vectors, HNSW is preferred. For >100M vectors where RAM is the bottleneck, IVF with product quantization (IVF-PQ) is typical.

Distance metrics: cosine vs Euclidean

The choice of distance metric fundamentally affects what 'similar' means. Three metrics dominate in practice:

\text{cosine}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\|\,\|\mathbf{b}\|}

d_{L2}(\mathbf{a}, \mathbf{b}) = \|\mathbf{a} - \mathbf{b}\|_2 = \sqrt{\sum_i (a_i - b_i)^2}

\text{dot}(\mathbf{a}, \mathbf{b}) = \mathbf{a} \cdot \mathbf{b} = \sum_i a_i b_i

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def euclidean_distance(a, b):
    return np.linalg.norm(a - b)

def dot_product(a, b):
    return np.dot(a, b)

# Example with text-like embeddings (normalized)
a = np.array([0.8, 0.2, 0.5, -0.1])
b = np.array([0.7, 0.3, 0.4, 0.1])   # similar to a
c = np.array([-0.5, 0.9, -0.3, 0.8]) # different from a

print(f"Cosine(a,b) = {cosine_similarity(a,b):.4f}")   # high (similar)
print(f"Cosine(a,c) = {cosine_similarity(a,c):.4f}")   # low (different)
print(f"L2(a,b)    = {euclidean_distance(a,b):.4f}")   # small (similar)
print(f"L2(a,c)    = {euclidean_distance(a,c):.4f}")   # large (different)

# For normalized vectors: cosine_similarity = dot_product
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
print(f"Dot(norm_a, norm_b) = {dot_product(a_norm, b_norm):.4f}")
print(f"Cosine(a, b)         = {cosine_similarity(a, b):.4f}")

Vector database options in 2025

Database	Type	Best for	Notes
pgvector	PostgreSQL extension	Existing Postgres stacks, <10M vectors	No separate infrastructure; HNSW + IVF; SQL-native
Pinecone	Managed cloud	Production RAG, fastest setup	Fully managed; high cost at scale; serverless tier available
Qdrant	Open-source / cloud	High-performance self-hosted	Rust core; fastest open-source; sparse+dense hybrid search
Weaviate	Open-source / cloud	Multimodal, knowledge graphs	Modules for CLIP, text2vec; GraphQL interface
Milvus	Open-source / cloud	Billions of vectors, enterprise	Distributed architecture; GPU-accelerated indexing
ChromaDB	Open-source (Python)	Prototyping, development	Zero setup; in-memory or persistent; not production-scale

Choosing a vector database: For most RAG applications with <5M chunks: pgvector in your existing Postgres. For production with 5M–100M vectors: Qdrant (self-hosted) or Pinecone (managed). For >100M vectors or multi-modal: Milvus. Don't prematurely migrate — pgvector with HNSW handles millions of vectors with sub-10ms latency on modest hardware.

Metadata filtering with vectors

Production RAG systems rarely do pure vector search — they combine semantic similarity with metadata filters to scope retrieval to the right documents, users, or date ranges:

-- Create a table with embeddings + metadata
CREATE TABLE document_chunks (
    id         BIGSERIAL PRIMARY KEY,
    user_id    INTEGER NOT NULL,
    doc_id     INTEGER NOT NULL,
    page_num   INTEGER,
    chunk_text TEXT,
    embedding  VECTOR(1536),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index on embedding column
CREATE INDEX ON document_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Also index metadata for fast pre-filtering
CREATE INDEX ON document_chunks (user_id, doc_id);

-- ─── Filtered vector search ─────────────────────────────────────
-- Find top-5 chunks most similar to query_embedding,
-- but ONLY from doc_id=42 belonging to user_id=123, pages 30-60
SELECT
    id,
    chunk_text,
    page_num,
    1 - (embedding <=> '[0.1, 0.2, ..., 0.9]'::vector) AS similarity
FROM document_chunks
WHERE user_id = 123
  AND doc_id  = 42
  AND page_num BETWEEN 30 AND 60
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'::vector   -- cosine distance
LIMIT 5;

Pre-filtering vs post-filtering: Pre-filtering (apply metadata filter before ANN search) can miss good results if too few vectors remain after filtering. Post-filtering (ANN search first, then filter) may return fewer than k results if many top-k neighbors don't match filters. Best practice: use pre-filtering with a large k (e.g., k=100), then re-rank and truncate to final k=5. pgvector's HNSW supports pre-filtered search efficiently via standard SQL WHERE clauses.

Practice questions

What is the geometric interpretation of dot product and why does it measure similarity? (Answer: A·B = ||A|| ||B|| cos(θ). If both vectors are unit length: A·B = cos(θ) — the cosine of the angle between them. When θ=0 (same direction): cos(0)=1, maximum similarity. When θ=90° (orthogonal): cos(90°)=0, no similarity. When θ=180° (opposite): cos(180°)=-1, maximum dissimilarity. For embeddings: similar concepts have small angular separation → high cosine similarity. This geometric interpretation is why cosine similarity is the standard metric for embedding comparison.)
What is the difference between L1 norm, L2 norm, and L∞ norm of a vector? (Answer: L1 norm (Manhattan): ||v||₁ = Σ|vᵢ| — sum of absolute values. L2 norm (Euclidean): ||v||₂ = √(Σvᵢ²) — geometric length. L∞ norm (Chebyshev): ||v||∞ = max|vᵢ| — largest absolute value. Applications: L2 in distance and similarity calculations; L1 in sparse representations and LASSO; L∞ for adversarial perturbation bounds (maximum pixel change). Gradient clipping uses L2 norm; L1 norm regularization promotes sparsity.)
Why is it impossible to visualize embeddings directly, and what techniques help? (Answer: Embeddings are typically 768–3072 dimensional. Humans can perceive at most 3 dimensions. Dimensionality reduction techniques: PCA projects to top-2 principal components (preserves global variance structure). t-SNE (perplexity-based) preserves local neighborhood structure — similar embeddings cluster visually. UMAP (topological) preserves both local and global structure better than t-SNE. All lose information. Use t-SNE/UMAP for exploring cluster structure; use PCA for understanding variance explained by dimensions.)
What is a sparse vector vs a dense vector and when is each appropriate? (Answer: Sparse vector: most entries are zero — efficiently stored as (index, value) pairs. BM25/TF-IDF produces sparse vectors: only vocabulary words present in a document have non-zero values (vocabulary size: 50,000+, document has ~100 unique words → sparse). Dense vector: all entries are non-zero — neural embedding outputs. 768 floats all meaningful. Use sparse for: exact keyword matching, interpretable features, memory-constrained environments. Use dense for: semantic search, capturing contextual meaning, transfer learning.)
What is vector arithmetic and what does it reveal about embedding spaces? (Answer: King - Man + Woman ≈ Queen. Embeddings encode relational structure as vector offsets. The ' - Man + Woman' operation is a vector translation in the gender direction. Similarly: Paris - France + Italy ≈ Rome (capital cities relationship). This arithmetic works because training on text that discusses relationships (kings and queens, capitals and countries) creates consistent directional encodings. Applications: analogy solving, bias detection (measuring direction of gender/race bias), concept interpolation, and zero-shot classification via concept vectors.)

LumiChats uses pgvector (PostgreSQL vector extension via Supabase) for all vector operations. Document chunks and memory embeddings are stored and retrieved using cosine similarity. Metadata filters on user_id and document_id ensure each user only retrieves their own content.

Definition

What makes vector databases different

Property	Traditional DB (SQL)	Vector DB
Query type	Exact match / range scan	Nearest-neighbor (by similarity)
Index structure	B-tree, hash index	HNSW graph, IVF, or product quantization
Query language	SQL WHERE clause	ANN search with k and metric
Scale	Billions of rows	Millions–billions of vectors
Latency	<1ms exact lookup	1–50ms ANN search at scale
Use case	Structured data retrieval	Semantic search, RAG, recommendations

HNSW: how fast ANN search works

Hierarchical Navigable Small World (HNSW, Malkov & Yashunin 2020) is the dominant ANN algorithm used by pgvector, Pinecone, Weaviate, Milvus, and Qdrant. It builds a multi-layer proximity graph:

Layer 0: dense graph — every vector is a node, connected to its M nearest neighbors
Layer 1+: progressively sparser — each layer has ~1/e fraction of the nodes from the layer below
Search: start at the top layer (few nodes, long-range jumps), greedily navigate toward query, descend to denser layers, refine at layer 0

HNSW with hnswlib — billion-scale ANN in Python

import hnswlib
import numpy as np

DIM = 1536           # OpenAI text-embedding-3-large dimension
N_VECTORS = 1_000_000

# Build the index
index = hnswlib.Index(space='cosine', dim=DIM)
index.init_index(
    max_elements=N_VECTORS,
    ef_construction=200,   # higher = better recall, slower build
    M=16                   # edges per node; 16–64 is typical
)

# Add vectors (batch for speed)
vectors = np.random.randn(N_VECTORS, DIM).astype(np.float32)
ids     = np.arange(N_VECTORS)
index.add_items(vectors, ids)

# Set query-time exploration parameter
index.set_ef(50)   # ef >= k; higher = better recall, slower

# Query: find top-10 nearest neighbors
query = np.random.randn(DIM).astype(np.float32)
labels, distances = index.knn_query(query, k=10)

print(f"Top-10 neighbor IDs: {labels[0]}")
print(f"Cosine distances:    {distances[0].round(4)}")

# Save / load index
index.save_index("my_index.bin")
# index.load_index("my_index.bin")   # reload without rebuild

HNSW vs IVF

IVF (Inverted File Index) partitions vectors into clusters and only searches the closest clusters at query time. IVF uses far less memory than HNSW but has lower recall at the same speed. For production RAG at <50M vectors, HNSW is preferred. For >100M vectors where RAM is the bottleneck, IVF with product quantization (IVF-PQ) is typical.

Distance metrics: cosine vs Euclidean

The choice of distance metric fundamentally affects what 'similar' means. Three metrics dominate in practice:

Cosine similarity: angle between vectors, independent of magnitude. Range: [−1, 1]. 1 = identical direction, 0 = orthogonal, −1 = opposite. Best for text embeddings where length normalization is desired.

Euclidean (L2) distance: straight-line distance in embedding space. Sensitive to vector magnitude. Best when magnitude carries information (e.g., image descriptors).

Dot product: equivalent to cosine similarity when vectors are L2-normalized. Maximum Inner Product Search (MIPS) is used in recommendation systems where magnitude encodes popularity or confidence.

Computing all three distance metrics

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def euclidean_distance(a, b):
    return np.linalg.norm(a - b)

def dot_product(a, b):
    return np.dot(a, b)

# Example with text-like embeddings (normalized)
a = np.array([0.8, 0.2, 0.5, -0.1])
b = np.array([0.7, 0.3, 0.4, 0.1])   # similar to a
c = np.array([-0.5, 0.9, -0.3, 0.8]) # different from a

print(f"Cosine(a,b) = {cosine_similarity(a,b):.4f}")   # high (similar)
print(f"Cosine(a,c) = {cosine_similarity(a,c):.4f}")   # low (different)
print(f"L2(a,b)    = {euclidean_distance(a,b):.4f}")   # small (similar)
print(f"L2(a,c)    = {euclidean_distance(a,c):.4f}")   # large (different)

# For normalized vectors: cosine_similarity = dot_product
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
print(f"Dot(norm_a, norm_b) = {dot_product(a_norm, b_norm):.4f}")
print(f"Cosine(a, b)         = {cosine_similarity(a, b):.4f}")

Vector database options in 2025

Database	Type	Best for	Notes
pgvector	PostgreSQL extension	Existing Postgres stacks, <10M vectors	No separate infrastructure; HNSW + IVF; SQL-native
Pinecone	Managed cloud	Production RAG, fastest setup	Fully managed; high cost at scale; serverless tier available
Qdrant	Open-source / cloud	High-performance self-hosted	Rust core; fastest open-source; sparse+dense hybrid search
Weaviate	Open-source / cloud	Multimodal, knowledge graphs	Modules for CLIP, text2vec; GraphQL interface
Milvus	Open-source / cloud	Billions of vectors, enterprise	Distributed architecture; GPU-accelerated indexing
ChromaDB	Open-source (Python)	Prototyping, development	Zero setup; in-memory or persistent; not production-scale

Choosing a vector database

For most RAG applications with <5M chunks: pgvector in your existing Postgres. For production with 5M–100M vectors: Qdrant (self-hosted) or Pinecone (managed). For >100M vectors or multi-modal: Milvus. Don't prematurely migrate — pgvector with HNSW handles millions of vectors with sub-10ms latency on modest hardware.

Metadata filtering with vectors

Production RAG systems rarely do pure vector search — they combine semantic similarity with metadata filters to scope retrieval to the right documents, users, or date ranges:

pgvector: combining vector similarity with metadata filters

-- Create a table with embeddings + metadata
CREATE TABLE document_chunks (
    id         BIGSERIAL PRIMARY KEY,
    user_id    INTEGER NOT NULL,
    doc_id     INTEGER NOT NULL,
    page_num   INTEGER,
    chunk_text TEXT,
    embedding  VECTOR(1536),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index on embedding column
CREATE INDEX ON document_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Also index metadata for fast pre-filtering
CREATE INDEX ON document_chunks (user_id, doc_id);

-- ─── Filtered vector search ─────────────────────────────────────
-- Find top-5 chunks most similar to query_embedding,
-- but ONLY from doc_id=42 belonging to user_id=123, pages 30-60
SELECT
    id,
    chunk_text,
    page_num,
    1 - (embedding <=> '[0.1, 0.2, ..., 0.9]'::vector) AS similarity
FROM document_chunks
WHERE user_id = 123
  AND doc_id  = 42
  AND page_num BETWEEN 30 AND 60
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'::vector   -- cosine distance
LIMIT 5;

Pre-filtering vs post-filtering

Pre-filtering (apply metadata filter before ANN search) can miss good results if too few vectors remain after filtering. Post-filtering (ANN search first, then filter) may return fewer than k results if many top-k neighbors don't match filters. Best practice: use pre-filtering with a large k (e.g., k=100), then re-rank and truncate to final k=5. pgvector's HNSW supports pre-filtered search efficiently via standard SQL WHERE clauses.

Practice questions

What is the geometric interpretation of dot product and why does it measure similarity? (Answer: A·B = ||A|| ||B|| cos(θ). If both vectors are unit length: A·B = cos(θ) — the cosine of the angle between them. When θ=0 (same direction): cos(0)=1, maximum similarity. When θ=90° (orthogonal): cos(90°)=0, no similarity. When θ=180° (opposite): cos(180°)=-1, maximum dissimilarity. For embeddings: similar concepts have small angular separation → high cosine similarity. This geometric interpretation is why cosine similarity is the standard metric for embedding comparison.)
What is the difference between L1 norm, L2 norm, and L∞ norm of a vector? (Answer: L1 norm (Manhattan): ||v||₁ = Σ|vᵢ| — sum of absolute values. L2 norm (Euclidean): ||v||₂ = √(Σvᵢ²) — geometric length. L∞ norm (Chebyshev): ||v||∞ = max|vᵢ| — largest absolute value. Applications: L2 in distance and similarity calculations; L1 in sparse representations and LASSO; L∞ for adversarial perturbation bounds (maximum pixel change). Gradient clipping uses L2 norm; L1 norm regularization promotes sparsity.)
Why is it impossible to visualize embeddings directly, and what techniques help? (Answer: Embeddings are typically 768–3072 dimensional. Humans can perceive at most 3 dimensions. Dimensionality reduction techniques: PCA projects to top-2 principal components (preserves global variance structure). t-SNE (perplexity-based) preserves local neighborhood structure — similar embeddings cluster visually. UMAP (topological) preserves both local and global structure better than t-SNE. All lose information. Use t-SNE/UMAP for exploring cluster structure; use PCA for understanding variance explained by dimensions.)
What is a sparse vector vs a dense vector and when is each appropriate? (Answer: Sparse vector: most entries are zero — efficiently stored as (index, value) pairs. BM25/TF-IDF produces sparse vectors: only vocabulary words present in a document have non-zero values (vocabulary size: 50,000+, document has ~100 unique words → sparse). Dense vector: all entries are non-zero — neural embedding outputs. 768 floats all meaningful. Use sparse for: exact keyword matching, interpretable features, memory-constrained environments. Use dense for: semantic search, capturing contextual meaning, transfer learning.)
What is vector arithmetic and what does it reveal about embedding spaces? (Answer: King - Man + Woman ≈ Queen. Embeddings encode relational structure as vector offsets. The ' - Man + Woman' operation is a vector translation in the gender direction. Similarly: Paris - France + Italy ≈ Rome (capital cities relationship). This arithmetic works because training on text that discusses relationships (kings and queens, capitals and countries) creates consistent directional encodings. Applications: analogy solving, bias detection (measuring direction of gender/race bias), concept interpolation, and zero-shot classification via concept vectors.)

On LumiChats

Try it free

Vectors & Vector Databases

What makes vector databases different

HNSW: how fast ANN search works

Distance metrics: cosine vs Euclidean

Vector database options in 2025

Metadata filtering with vectors

Practice questions

Vectors & Vector Databases

What makes vector databases different

HNSW: how fast ANN search works

Distance metrics: cosine vs Euclidean

Vector database options in 2025

Metadata filtering with vectors

Practice questions

Practice what you just learned

Related Terms