Definition

Hierarchical clustering builds a nested hierarchy of clusters represented as a dendrogram (tree). Agglomerative (bottom-up): start with each point as its own cluster, repeatedly merge the two most similar clusters until one remains. Divisive (top-down): start with all points in one cluster, recursively split into smaller clusters. The key decision is the linkage criterion — how to measure distance between clusters. Unlike K-Means, hierarchical clustering does not require specifying K beforehand — you cut the dendrogram at any level. GATE tests linkage types and complexity.

Real-life analogy: The family tree

Imagine organising people into a family tree. At the bottom are individual people. Moving up, siblings merge into a family unit. Families merge into extended families. Extended families merge into clans. Clans merge into a single human race at the top. Agglomerative clustering builds this bottom-up. A dendrogram visualises the entire tree — you can cut at any height to get any number of clusters you want.

Agglomerative clustering algorithm

Start: n clusters, one per data point.
Compute distance matrix: D(i,j) = distance between cluster i and cluster j.
Merge: Find the pair of clusters with minimum distance. Merge them into one cluster.
Update: Recompute distances from the new merged cluster to all others using the linkage criterion.
Repeat steps 3–4 until one cluster remains. Record the merge history as a dendrogram.

Linkage	Distance formula	Properties	When to use
Single linkage	min distance between any two points in clusters	Chaining effect — elongated clusters	Data with chains or non-spherical shapes
Complete linkage	max distance between any two points in clusters	Compact, spherical clusters	Well-separated, similar-sized clusters
Average linkage (UPGMA)	mean distance between all pairs	Compromise between single and complete	General purpose — often best
Ward linkage	increase in total WCSS from merging	Minimises variance — spherical clusters	Similar to K-Means; most commonly used

Agglomerative clustering with dendrogram

import numpy as np
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.datasets import make_blobs

X, y_true = make_blobs(n_samples=30, centers=3,
                        cluster_std=0.8, random_state=42)

# ── scipy dendrogram (full hierarchy) ──
# Linkage types: 'single', 'complete', 'average', 'ward'
Z_ward    = linkage(X, method='ward')
Z_single  = linkage(X, method='single')
Z_complete = linkage(X, method='complete')

print("Ward linkage merging sequence (last 5):")
print(Z_ward[-5:])   # Columns: [cluster_i, cluster_j, distance, new_size]

# ── sklearn AgglomerativeClustering ──
for linkage_type in ['ward', 'complete', 'average', 'single']:
    if linkage_type == 'ward':
        agg = AgglomerativeClustering(n_clusters=3, linkage=linkage_type)
    else:
        agg = AgglomerativeClustering(n_clusters=3, linkage=linkage_type,
                                       metric='euclidean')
    labels = agg.fit_predict(X)
    # Compute purity (when true labels known)
    from scipy.stats import mode
    purity = sum(mode(y_true[labels==k])[1][0] for k in range(3)) / len(X)
    print(f"{linkage_type:<10} purity: {purity:.3f}")

Cutting the dendrogram and comparing with K-Means

The dendrogram's y-axis shows the merge distance. Cutting horizontally at height h gives clusters — the number of vertical lines crossed equals the number of clusters at that cut. This is the key advantage: no need to specify K. You decide the resolution after seeing the full hierarchy.

Property	K-Means	Agglomerative (Ward)	DBSCAN
K required upfront?	Yes	No (post-hoc cut)	No
Cluster shape	Spherical only	Spherical (Ward)	Arbitrary
Handles outliers	No	Partially	Yes (noise class)
Time complexity	O(nKdT)	O(n²log n) Ward; O(n³) naive	O(n log n) with spatial index
Interpretability	Centroid description	Dendrogram	Core/border/noise distinction

GATE key: Single linkage chaining

Single linkage suffers from the chaining effect: if two clusters have just one pair of close points, they merge regardless of how far apart the rest are. This creates long, thin, snake-like clusters that are often not meaningful. Complete linkage avoids this by requiring ALL points in the two clusters to be close before merging.

Practice questions (GATE-style)

What is the time complexity of naive agglomerative hierarchical clustering? (Answer: O(n³) — at each of n steps, compute all O(n²) pairwise distances. Optimised implementations (Ward with heap) achieve O(n² log n).)
In complete linkage, the distance between two clusters is: (Answer: The maximum distance between any point in cluster A and any point in cluster B — the "furthest neighbour" approach.)
Agglomerative clustering with Ward linkage minimises: (Answer: The increase in total within-cluster sum of squares (WCSS) at each merge — it minimises variance and tends to produce compact, spherical clusters.)
A dendrogram is cut at height h=5.0, revealing 4 vertical line crossings. How many clusters result? (Answer: 4 clusters — each crossing corresponds to one cluster at that cut level.)
When would you choose hierarchical over K-Means clustering? (Answer: When K is unknown in advance, when you need interpretable cluster merging history, when data is small (n<10000), or when the cluster shape is non-spherical and you use single-linkage.)

On LumiChats

Hierarchical clustering is used in NLP for concept taxonomy building: clustering word embeddings hierarchically creates a tree of related concepts. LumiChats uses similar ideas to organise topics within long documents for better RAG retrieval.

Try it free

Hierarchical Clustering — Agglomerative & Divisive

Real-life analogy: The family tree

Agglomerative clustering algorithm

Cutting the dendrogram and comparing with K-Means

Practice questions (GATE-style)

Try LumiChats for ₹69

Related Terms