Definition

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique that transforms features into a new orthogonal coordinate system — the principal components — ordered by the amount of variance they explain. The first principal component (PC1) captures the direction of maximum variance; PC2 captures maximum remaining variance perpendicular to PC1, and so on. PCA reduces dimensions while retaining the most information, removes correlated features, and speeds up downstream ML. PCA is one of the highest-weightage topics in GATE DS&AI — typically 3–4 marks, often as numerical questions.

Real-life analogy: The shadow on the wall

Imagine a 3D object with a torch shining on it from different angles. Some angles cast a very informative shadow (showing the object's shape well). Other angles cast a small, uninformative shadow. PCA finds the angle that casts the most informative shadow — the direction that captures maximum variation in the data. The shadow is the low-dimensional projection.

PCA algorithm — step by step

Standardise: Centre and scale features to zero mean and unit variance (StandardScaler). Critical — PCA is scale-sensitive.
Compute covariance matrix: Σ = (1/n) XᵀX (X already mean-centred). Σ is (p×p) symmetric positive semi-definite.
Eigendecomposition: Decompose Σ = VΛVᵀ where columns of V are eigenvectors (principal components) and Λ is diagonal with eigenvalues λ₁ ≥ λ₂ ≥ ... ≥ λₚ.
Select top-K eigenvectors: Choose K eigenvectors with largest eigenvalues — they explain the most variance.
Project: Transform data onto the K principal components: Z = XV_K (where V_K is the p×K matrix of top-K eigenvectors).

The proportion of total variance explained by each component. Sum of all ratios = 1 (100%). Choose K so cumulative variance ≥ 95% (or 99%). This is how GATE tests PCA: given eigenvalues, compute explained variance and choose components.

PCA from scratch using eigendecomposition + sklearn

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data     # 150×4 (sepal length, sepal width, petal length, petal width)

# ── PCA from scratch ──
# Step 1: Standardise
X_scaled = (X - X.mean(axis=0)) / X.std(axis=0)

# Step 2: Covariance matrix
Sigma = np.cov(X_scaled, rowvar=False)  # (4×4)
print("Covariance matrix shape:", Sigma.shape)

# Step 3: Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eigh(Sigma)
# Sort descending (eigh returns ascending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues  = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Eigenvalues:", eigenvalues.round(3))
print("Explained variance ratio:", (eigenvalues/eigenvalues.sum()).round(3))
print("Cumulative explained variance:", np.cumsum(eigenvalues/eigenvalues.sum()).round(3))

# Step 4: Project onto top-2 PCs
V2 = eigenvectors[:, :2]        # (4×2)
Z  = X_scaled @ V2              # (150×2)
print(f"Original shape: {X.shape} → Reduced: {Z.shape}")

# ── sklearn PCA ──
pca = PCA(n_components=2)
scaler = StandardScaler()
X_pca = pca.fit_transform(scaler.fit_transform(X))
print(f"sklearn explained variance: {pca.explained_variance_ratio_.round(3)}")
# PC1 explains ~73%, PC2 explains ~23% → 2 PCs capture ~96% of variance!

Scree plot and choosing K

A scree plot plots eigenvalues (or explained variance ratios) in decreasing order. Look for the 'elbow' — where variance drops sharply and then levels off. Components before the elbow are meaningful; those after are noise. The 95% cumulative variance rule is more objective: choose K such that the first K components explain ≥ 95% of total variance.

PCA is not feature selection

PCA creates NEW features (linear combinations of original features) — it does not select original features. Principal components are directions in feature space, not individual original features. If interpretability matters, use feature selection methods (Lasso, mutual information) instead. PCA components have no direct physical meaning.

Practice questions (GATE-style)

A dataset has 10 features. Eigenvalues of the covariance matrix are [5, 3, 2, 1, 0.5, 0.3, 0.1, 0.1, 0.05, 0.05]. How many PCs capture ≥ 90% variance? (Answer: Total = 12.1. Cumulative: 5(41%), 8(66%), 10(82%), 11(91%). Three PCs capture 83%, four capture 91% → need 4 PCs for ≥90%.)
Why must data be standardised before PCA? (Answer: PCA maximises variance. Features with large measurement scales (e.g., salary in thousands vs. age in years) dominate the covariance matrix. Standardising ensures all features contribute equally.)
Principal components are always: (Answer: Orthogonal to each other (perpendicular in feature space) — they are uncorrelated by construction. This is the key property that makes PCA useful for removing correlation.)
If all features are already uncorrelated, what does PCA produce? (Answer: Principal components aligned with the original feature axes — PCA finds nothing new. The eigenvectors are the standard basis vectors. PCA is most useful when features are correlated.)
PCA reduces a 1000-feature dataset to 10 PCs. A new test point has 1000 features. How do you project it? (Answer: Subtract the training mean, divide by training std (same scaler), then multiply by the 1000×10 matrix of top-10 eigenvectors. Never refit the scaler or PCA on test data.)

On LumiChats

PCA is used extensively in LLM research: visualising high-dimensional token embeddings in 2D, understanding what directions in embedding space correspond to semantic properties (e.g., the sentiment direction, the formality direction), and compressing model activations for efficient inference.

Try it free

Principal Component Analysis (PCA)

Real-life analogy: The shadow on the wall

PCA algorithm — step by step

Scree plot and choosing K

Practice questions (GATE-style)

Try LumiChats for ₹69

Related Terms