Glossary/Hierarchical Clustering — Agglomerative & Divisive
Machine Learning

Hierarchical Clustering — Agglomerative & Divisive

Building a tree of clusters — no need to specify K in advance.


Definition

Hierarchical clustering builds a nested hierarchy of clusters represented as a dendrogram (tree). Agglomerative (bottom-up): start with each point as its own cluster, repeatedly merge the two most similar clusters until one remains. Divisive (top-down): start with all points in one cluster, recursively split into smaller clusters. The key decision is the linkage criterion — how to measure distance between clusters. Unlike K-Means, hierarchical clustering does not require specifying K beforehand — you cut the dendrogram at any level. GATE tests linkage types and complexity.

Real-life analogy: The family tree

Imagine organising people into a family tree. At the bottom are individual people. Moving up, siblings merge into a family unit. Families merge into extended families. Extended families merge into clans. Clans merge into a single human race at the top. Agglomerative clustering builds this bottom-up. A dendrogram visualises the entire tree — you can cut at any height to get any number of clusters you want.

Agglomerative clustering algorithm

  1. Start: n clusters, one per data point.
  2. Compute distance matrix: D(i,j) = distance between cluster i and cluster j.
  3. Merge: Find the pair of clusters with minimum distance. Merge them into one cluster.
  4. Update: Recompute distances from the new merged cluster to all others using the linkage criterion.
  5. Repeat steps 3–4 until one cluster remains. Record the merge history as a dendrogram.
LinkageDistance formulaPropertiesWhen to use
Single linkagemin distance between any two points in clustersChaining effect — elongated clustersData with chains or non-spherical shapes
Complete linkagemax distance between any two points in clustersCompact, spherical clustersWell-separated, similar-sized clusters
Average linkage (UPGMA)mean distance between all pairsCompromise between single and completeGeneral purpose — often best
Ward linkageincrease in total WCSS from mergingMinimises variance — spherical clustersSimilar to K-Means; most commonly used

Agglomerative clustering with dendrogram

import numpy as np
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.datasets import make_blobs

X, y_true = make_blobs(n_samples=30, centers=3,
                        cluster_std=0.8, random_state=42)

# ── scipy dendrogram (full hierarchy) ──
# Linkage types: 'single', 'complete', 'average', 'ward'
Z_ward    = linkage(X, method='ward')
Z_single  = linkage(X, method='single')
Z_complete = linkage(X, method='complete')

print("Ward linkage merging sequence (last 5):")
print(Z_ward[-5:])   # Columns: [cluster_i, cluster_j, distance, new_size]

# ── sklearn AgglomerativeClustering ──
for linkage_type in ['ward', 'complete', 'average', 'single']:
    if linkage_type == 'ward':
        agg = AgglomerativeClustering(n_clusters=3, linkage=linkage_type)
    else:
        agg = AgglomerativeClustering(n_clusters=3, linkage=linkage_type,
                                       metric='euclidean')
    labels = agg.fit_predict(X)
    # Compute purity (when true labels known)
    from scipy.stats import mode
    purity = sum(mode(y_true[labels==k])[1][0] for k in range(3)) / len(X)
    print(f"{linkage_type:<10} purity: {purity:.3f}")

Cutting the dendrogram and comparing with K-Means

The dendrogram's y-axis shows the merge distance. Cutting horizontally at height h gives clusters — the number of vertical lines crossed equals the number of clusters at that cut. This is the key advantage: no need to specify K. You decide the resolution after seeing the full hierarchy.

PropertyK-MeansAgglomerative (Ward)DBSCAN
K required upfront?YesNo (post-hoc cut)No
Cluster shapeSpherical onlySpherical (Ward)Arbitrary
Handles outliersNoPartiallyYes (noise class)
Time complexityO(nKdT)O(n²log n) Ward; O(n³) naiveO(n log n) with spatial index
InterpretabilityCentroid descriptionDendrogramCore/border/noise distinction

GATE key: Single linkage chaining

Single linkage suffers from the chaining effect: if two clusters have just one pair of close points, they merge regardless of how far apart the rest are. This creates long, thin, snake-like clusters that are often not meaningful. Complete linkage avoids this by requiring ALL points in the two clusters to be close before merging.

Practice questions (GATE-style)

  1. What is the time complexity of naive agglomerative hierarchical clustering? (Answer: O(n³) — at each of n steps, compute all O(n²) pairwise distances. Optimised implementations (Ward with heap) achieve O(n² log n).)
  2. In complete linkage, the distance between two clusters is: (Answer: The maximum distance between any point in cluster A and any point in cluster B — the "furthest neighbour" approach.)
  3. Agglomerative clustering with Ward linkage minimises: (Answer: The increase in total within-cluster sum of squares (WCSS) at each merge — it minimises variance and tends to produce compact, spherical clusters.)
  4. A dendrogram is cut at height h=5.0, revealing 4 vertical line crossings. How many clusters result? (Answer: 4 clusters — each crossing corresponds to one cluster at that cut level.)
  5. When would you choose hierarchical over K-Means clustering? (Answer: When K is unknown in advance, when you need interpretable cluster merging history, when data is small (n<10000), or when the cluster shape is non-spherical and you use single-linkage.)

On LumiChats

Hierarchical clustering is used in NLP for concept taxonomy building: clustering word embeddings hierarchically creates a tree of related concepts. LumiChats uses similar ideas to organise topics within long documents for better RAG retrieval.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

3 terms