Definition

Variable Elimination (VE) is the standard algorithm for exact probabilistic inference in Bayesian networks. Given query variables Q, evidence variables E=e, and hidden variables H, VE computes P(Q|E=e) by: (1) reducing all CPTs consistent with the evidence, (2) multiplying factors that share variables, and (3) summing out (marginalizing) each hidden variable in turn. The elimination order critically determines computational cost. GATE DA tests factor multiplication, variable summation, and the relationship between elimination order and treewidth.

Inference problem formulation

The inference task: given a Bayesian network representing P(X₁,…,Xₙ), query variables Q ⊆ {X₁,…,Xₙ}, and observed evidence E=e, compute the posterior P(Q | E=e).

The inference sum: α = 1/P(E=e) is the normalizing constant. Variable elimination makes this tractable by interleaving multiplication and marginalization.

Factors and factor operations

A factor f(X₁,…,Xₖ) is a function mapping assignments of its variables to non-negative real numbers. CPTs are factors. Variable elimination uses two operations:

Operation	Symbol	Definition	Result
Factor product (pointwise)	f₁ × f₂	Multiply corresponding entries for all shared variable assignments	New factor over the union of variables from f₁ and f₂
Factor marginalization (summing out)	Σ_Y f	For each assignment of remaining variables, sum over all values of Y	New factor with Y eliminated

Factor product and marginalization — the two core VE operations

import numpy as np

# f1(A,B): shape (2,2) — rows=A values, cols=B values
f1 = np.array([[0.2, 0.8],   # P(B=0|A=0), P(B=1|A=0)
                [0.7, 0.3]])  # P(B=0|A=1), P(B=1|A=1)

# f2(B,C): shape (2,2) — rows=B values, cols=C values
f2 = np.array([[0.4, 0.6],
                [0.9, 0.1]])

# --- Factor PRODUCT of f1(A,B) and f2(B,C) → f3(A,B,C) shape (2,2,2) ---
f3 = f1[:, :, np.newaxis] * f2[np.newaxis, :, :]

# --- Factor MARGINALIZATION: sum out B from f3 → f4(A,C) ---
f4 = f3.sum(axis=1)   # axis=1 is the B dimension
print("Factor f4(A,C) after summing out B:")
print(f4)

# --- After all hidden variables are summed out: NORMALIZE ---
# P(Q=q | E=e) = f_final[q] / f_final.sum()

Variable elimination algorithm step by step

The VE algorithm: (1) Start with all CPTs as initial factors. (2) Restrict all factors to evidence values E=e. (3) For each hidden variable Z in the elimination order: collect all factors mentioning Z, multiply them into one factor, sum out Z. (4) Multiply all remaining factors. (5) Normalize.

VE trace: P(Alarm=T | JohnCalls=T, MaryCalls=T) — Burglary network

Variables: B, E, A, J, M
Query: A  |  Evidence: J=T, M=T  |  Hidden to eliminate: B, E

Initial factors (restrict to evidence):
  f1(B)       = P(B)               [no change — B not in evidence]
  f2(E)       = P(E)               [no change]
  f3(A,B,E)   = P(A|B,E)          [no change]
  f4(A)       = P(J=T|A)          [J fixed to T, reduces to function of A]
  f5(A)       = P(M=T|A)          [M fixed to T, reduces to function of A]

Step 1 — Eliminate E:
  Factors containing E: {f2(E), f3(A,B,E)}
  Product: f6(A,B,E) = f2(E) × f3(A,B,E)   [shape: 2×2×2]
  Sum out E: f7(A,B) = ΣE f6(A,B,E)         [shape: 2×2]

Step 2 — Eliminate B:
  Factors containing B: {f1(B), f7(A,B)}
  Product: f8(A,B) = f1(B) × f7(A,B)        [shape: 2×2]
  Sum out B: f9(A) = ΣB f8(A,B)              [shape: 2]

Remaining factors: f9(A), f4(A), f5(A)
Product: f10(A) = f9(A) × f4(A) × f5(A)
Normalize: P(A=T|J=T,M=T) = f10(T) / (f10(T) + f10(F)) ≈ 0.76

Complexity and elimination ordering

The cost of VE depends on the size of the largest intermediate factor created. This is determined by the elimination order — finding the optimal order is NP-hard, but practical heuristics work well.

Ordering heuristic	Strategy	Performance
Min-fill	Eliminate the variable that creates the fewest new edges (smallest new factor)	Best practical heuristic; near-optimal in most cases
Min-degree	Eliminate the variable with the fewest current neighbors	Fast to compute; reasonable quality
Topological	Eliminate in topological order (roots first)	Simple; often suboptimal for inference

VE complexity: n = number of variables, k = domain size, w = treewidth (width of the elimination graph). VE is polynomial when w is bounded, exponential in the worst case.

Treewidth and tractability

The treewidth w determines the cost of exact inference. Networks with small treewidth (chains: w=1, trees: w=1, polytrees: w=1) support efficient O(n·k²) inference. Dense networks have large treewidth, making exact inference intractable — this is when approximate inference (see the Sampling entry) becomes necessary.

Belief propagation on trees

When the Bayesian network is a tree, the Sum-Product (Belief Propagation) algorithm computes exact marginals for all variables simultaneously in O(n) time by passing messages up and down the tree — far more efficient than running VE separately for each query variable.

For networks with cycles, Loopy Belief Propagation (LBP) applies the same message-passing algorithm on the cyclic graph. LBP is not guaranteed to converge or give exact answers, but empirically works well in many practical applications — making it an important approximate inference method.

GATE PYQ Spotlight

GATE DA — Inference on a chain network (NAT, 2-mark)

GATE DA — Factor operations in VE (1-mark)

In variable elimination for Bayesian network inference, when eliminating variable X from the factor list: (A) Multiply only CPT of X with its children's CPTs (B) Multiply all factors containing X together, then sum out X (C) Sum out X from each factor separately (D) Replace all factors containing X with a uniform distribution Answer: (B). All factors mentioning X are combined into a single joint factor first, then X is marginalized. This is the critical step — separating the multiplication and summation gives incorrect results.

Practice questions

What is the computational complexity of variable elimination and what factors affect it? (Answer: Complexity is exponential in the width of the elimination order (size of the largest factor created during elimination). For a Bayesian network with n binary variables and an optimal elimination order, the complexity is O(n · 2^w) where w is the treewidth. For polytrees (singly-connected networks): O(n) — linear. For highly connected networks (dense graphs): can be exponential — exact inference is intractable, requiring approximate methods (MCMC, belief propagation).)
What is the connection between variable elimination and dynamic programming? (Answer: Both exploit the same principle: common subproblems are solved once and cached. VE caches intermediate factor products — the factor for a variable once computed is reused across all queries. DP on a sequence (e.g., Viterbi algorithm for HMMs) computes maximum probability paths using the same principle. The Viterbi algorithm IS variable elimination applied to a chain-structured Bayesian network (HMM). Both trade memory (storing cached factors) for time (avoiding recomputation).)
What is the difference between exact inference (variable elimination, belief propagation) and approximate inference (MCMC)? (Answer: Exact inference: computes the exact posterior probability. VE and belief propagation on tree-structured models are exact. MCMC (Markov Chain Monte Carlo): generates samples from the posterior distribution — computes approximate marginals from sample frequencies. Accuracy improves with more samples. Use exact when: network is sparse, queries are frequent. Use MCMC when: network is dense, exact inference is intractable, approximate answers are sufficient.)
What is a Markov blanket and why is it important for variable elimination? (Answer: Markov blanket of node X: parents, children, and co-parents (other parents of X's children). X is conditionally independent of all other nodes given its Markov blanket. Importance for VE: when eliminating X, only factors involving X's Markov blanket members need to be combined — factors involving other variables are irrelevant. This local structure is what makes VE tractable on sparse networks.)
What is belief propagation and how does it relate to variable elimination? (Answer: Belief propagation (message passing): each node sends messages to its neighbours summarising its local probability information. For tree-structured graphs: exact algorithm equivalent to running VE simultaneously for all variables — O(n) rather than O(n·VE_per_query). For graphs with loops (LBP — Loopy Belief Propagation): approximate — iterates until convergence, no guarantee of exactness. BP is the algorithm underlying turbo codes, LDPC codes, and many probabilistic graphical model inference systems.)

Exact Inference: Variable Elimination