Glossary/Vectors & Vector Databases
AI Fundamentals

Vectors & Vector Databases

How AI stores and retrieves meaning at scale.


Definition

In AI, a vector is a list of numbers representing the semantic meaning of data (text, images, audio). A vector database is a specialized storage system optimized for efficiently storing millions of these vectors and performing fast similarity search — finding vectors closest to a query vector using distance metrics like cosine similarity or Euclidean distance.

What makes vector databases different

Traditional databases are built for exact lookups. Vector databases are built for approximate nearest-neighbor (ANN) search — finding the k vectors geometrically closest to a query vector. These are fundamentally different computational problems:

PropertyTraditional DB (SQL)Vector DB
Query typeExact match / range scanNearest-neighbor (by similarity)
Index structureB-tree, hash indexHNSW graph, IVF, or product quantization
Query languageSQL WHERE clauseANN search with k and metric
ScaleBillions of rowsMillions–billions of vectors
Latency<1ms exact lookup1–50ms ANN search at scale
Use caseStructured data retrievalSemantic search, RAG, recommendations

Exact nearest-neighbor over n vectors in d dimensions costs O(n·d) per query — for 100M vectors of dimension 1536, that's 154 billion operations per query. ANN algorithms like HNSW reduce this to O(d · log n) with >99% recall at a tiny accuracy tradeoff.

HNSW: how fast ANN search works

Hierarchical Navigable Small World (HNSW, Malkov & Yashunin 2020) is the dominant ANN algorithm used by pgvector, Pinecone, Weaviate, Milvus, and Qdrant. It builds a multi-layer proximity graph:

  • Layer 0: dense graph — every vector is a node, connected to its M nearest neighbors
  • Layer 1+: progressively sparser — each layer has ~1/e fraction of the nodes from the layer below
  • Search: start at the top layer (few nodes, long-range jumps), greedily navigate toward query, descend to denser layers, refine at layer 0

HNSW with hnswlib — billion-scale ANN in Python

import hnswlib
import numpy as np

DIM = 1536           # OpenAI text-embedding-3-large dimension
N_VECTORS = 1_000_000

# Build the index
index = hnswlib.Index(space='cosine', dim=DIM)
index.init_index(
    max_elements=N_VECTORS,
    ef_construction=200,   # higher = better recall, slower build
    M=16                   # edges per node; 16–64 is typical
)

# Add vectors (batch for speed)
vectors = np.random.randn(N_VECTORS, DIM).astype(np.float32)
ids     = np.arange(N_VECTORS)
index.add_items(vectors, ids)

# Set query-time exploration parameter
index.set_ef(50)   # ef >= k; higher = better recall, slower

# Query: find top-10 nearest neighbors
query = np.random.randn(DIM).astype(np.float32)
labels, distances = index.knn_query(query, k=10)

print(f"Top-10 neighbor IDs: {labels[0]}")
print(f"Cosine distances:    {distances[0].round(4)}")

# Save / load index
index.save_index("my_index.bin")
# index.load_index("my_index.bin")   # reload without rebuild

HNSW vs IVF

IVF (Inverted File Index) partitions vectors into clusters and only searches the closest clusters at query time. IVF uses far less memory than HNSW but has lower recall at the same speed. For production RAG at <50M vectors, HNSW is preferred. For >100M vectors where RAM is the bottleneck, IVF with product quantization (IVF-PQ) is typical.

Distance metrics: cosine vs Euclidean

The choice of distance metric fundamentally affects what 'similar' means. Three metrics dominate in practice:

Cosine similarity: angle between vectors, independent of magnitude. Range: [−1, 1]. 1 = identical direction, 0 = orthogonal, −1 = opposite. Best for text embeddings where length normalization is desired.

Euclidean (L2) distance: straight-line distance in embedding space. Sensitive to vector magnitude. Best when magnitude carries information (e.g., image descriptors).

Dot product: equivalent to cosine similarity when vectors are L2-normalized. Maximum Inner Product Search (MIPS) is used in recommendation systems where magnitude encodes popularity or confidence.

Computing all three distance metrics

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def euclidean_distance(a, b):
    return np.linalg.norm(a - b)

def dot_product(a, b):
    return np.dot(a, b)

# Example with text-like embeddings (normalized)
a = np.array([0.8, 0.2, 0.5, -0.1])
b = np.array([0.7, 0.3, 0.4, 0.1])   # similar to a
c = np.array([-0.5, 0.9, -0.3, 0.8]) # different from a

print(f"Cosine(a,b) = {cosine_similarity(a,b):.4f}")   # high (similar)
print(f"Cosine(a,c) = {cosine_similarity(a,c):.4f}")   # low (different)
print(f"L2(a,b)    = {euclidean_distance(a,b):.4f}")   # small (similar)
print(f"L2(a,c)    = {euclidean_distance(a,c):.4f}")   # large (different)

# For normalized vectors: cosine_similarity = dot_product
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
print(f"Dot(norm_a, norm_b) = {dot_product(a_norm, b_norm):.4f}")
print(f"Cosine(a, b)         = {cosine_similarity(a, b):.4f}")

Vector database options in 2025

DatabaseTypeBest forNotes
pgvectorPostgreSQL extensionExisting Postgres stacks, <10M vectorsNo separate infrastructure; HNSW + IVF; SQL-native
PineconeManaged cloudProduction RAG, fastest setupFully managed; high cost at scale; serverless tier available
QdrantOpen-source / cloudHigh-performance self-hostedRust core; fastest open-source; sparse+dense hybrid search
WeaviateOpen-source / cloudMultimodal, knowledge graphsModules for CLIP, text2vec; GraphQL interface
MilvusOpen-source / cloudBillions of vectors, enterpriseDistributed architecture; GPU-accelerated indexing
ChromaDBOpen-source (Python)Prototyping, developmentZero setup; in-memory or persistent; not production-scale

Choosing a vector database

For most RAG applications with <5M chunks: pgvector in your existing Postgres. For production with 5M–100M vectors: Qdrant (self-hosted) or Pinecone (managed). For >100M vectors or multi-modal: Milvus. Don't prematurely migrate — pgvector with HNSW handles millions of vectors with sub-10ms latency on modest hardware.

Metadata filtering with vectors

Production RAG systems rarely do pure vector search — they combine semantic similarity with metadata filters to scope retrieval to the right documents, users, or date ranges:

pgvector: combining vector similarity with metadata filters

-- Create a table with embeddings + metadata
CREATE TABLE document_chunks (
    id         BIGSERIAL PRIMARY KEY,
    user_id    INTEGER NOT NULL,
    doc_id     INTEGER NOT NULL,
    page_num   INTEGER,
    chunk_text TEXT,
    embedding  VECTOR(1536),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index on embedding column
CREATE INDEX ON document_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Also index metadata for fast pre-filtering
CREATE INDEX ON document_chunks (user_id, doc_id);

-- ─── Filtered vector search ─────────────────────────────────────
-- Find top-5 chunks most similar to query_embedding,
-- but ONLY from doc_id=42 belonging to user_id=123, pages 30-60
SELECT
    id,
    chunk_text,
    page_num,
    1 - (embedding <=> '[0.1, 0.2, ..., 0.9]'::vector) AS similarity
FROM document_chunks
WHERE user_id = 123
  AND doc_id  = 42
  AND page_num BETWEEN 30 AND 60
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'::vector   -- cosine distance
LIMIT 5;

Pre-filtering vs post-filtering

Pre-filtering (apply metadata filter before ANN search) can miss good results if too few vectors remain after filtering. Post-filtering (ANN search first, then filter) may return fewer than k results if many top-k neighbors don't match filters. Best practice: use pre-filtering with a large k (e.g., k=100), then re-rank and truncate to final k=5. pgvector's HNSW supports pre-filtered search efficiently via standard SQL WHERE clauses.

Practice questions

  1. What is the geometric interpretation of dot product and why does it measure similarity? (Answer: A·B = ||A|| ||B|| cos(θ). If both vectors are unit length: A·B = cos(θ) — the cosine of the angle between them. When θ=0 (same direction): cos(0)=1, maximum similarity. When θ=90° (orthogonal): cos(90°)=0, no similarity. When θ=180° (opposite): cos(180°)=-1, maximum dissimilarity. For embeddings: similar concepts have small angular separation → high cosine similarity. This geometric interpretation is why cosine similarity is the standard metric for embedding comparison.)
  2. What is the difference between L1 norm, L2 norm, and L∞ norm of a vector? (Answer: L1 norm (Manhattan): ||v||₁ = Σ|vᵢ| — sum of absolute values. L2 norm (Euclidean): ||v||₂ = √(Σvᵢ²) — geometric length. L∞ norm (Chebyshev): ||v||∞ = max|vᵢ| — largest absolute value. Applications: L2 in distance and similarity calculations; L1 in sparse representations and LASSO; L∞ for adversarial perturbation bounds (maximum pixel change). Gradient clipping uses L2 norm; L1 norm regularisation promotes sparsity.)
  3. Why is it impossible to visualise embeddings directly, and what techniques help? (Answer: Embeddings are typically 768–3072 dimensional. Humans can perceive at most 3 dimensions. Dimensionality reduction techniques: PCA projects to top-2 principal components (preserves global variance structure). t-SNE (perplexity-based) preserves local neighbourhood structure — similar embeddings cluster visually. UMAP (topological) preserves both local and global structure better than t-SNE. All lose information. Use t-SNE/UMAP for exploring cluster structure; use PCA for understanding variance explained by dimensions.)
  4. What is a sparse vector vs a dense vector and when is each appropriate? (Answer: Sparse vector: most entries are zero — efficiently stored as (index, value) pairs. BM25/TF-IDF produces sparse vectors: only vocabulary words present in a document have non-zero values (vocabulary size: 50,000+, document has ~100 unique words → sparse). Dense vector: all entries are non-zero — neural embedding outputs. 768 floats all meaningful. Use sparse for: exact keyword matching, interpretable features, memory-constrained environments. Use dense for: semantic search, capturing contextual meaning, transfer learning.)
  5. What is vector arithmetic and what does it reveal about embedding spaces? (Answer: King - Man + Woman ≈ Queen. Embeddings encode relational structure as vector offsets. The ' - Man + Woman' operation is a vector translation in the gender direction. Similarly: Paris - France + Italy ≈ Rome (capital cities relationship). This arithmetic works because training on text that discusses relationships (kings and queens, capitals and countries) creates consistent directional encodings. Applications: analogy solving, bias detection (measuring direction of gender/race bias), concept interpolation, and zero-shot classification via concept vectors.)

On LumiChats

LumiChats uses pgvector (PostgreSQL vector extension via Supabase) for all vector operations. Document chunks and memory embeddings are stored and retrieved using cosine similarity. Metadata filters on user_id and document_id ensure each user only retrieves their own content.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

3 terms