A Bayesian network (also called a belief network or Bayes net) is a directed acyclic graph (DAG) where each node represents a random variable and each directed edge represents a direct probabilistic influence. Each node stores a Conditional Probability Table (CPT) that quantifies P(Xᵢ | Parents(Xᵢ)). Bayesian networks compactly represent the full joint probability distribution over all variables by exploiting conditional independence. They are the most important probabilistic graphical model in GATE DA, appearing in questions on joint probability calculation, d-separation, inference, and sampling.
Probability revision (prerequisites)
Key prerequisites: P(A|B) = P(A∩B)/P(B) (conditional probability), product rule P(A∩B) = P(A|B)P(B), Bayes' theorem P(A|B) = P(B|A)P(A)/P(B), law of total probability P(B) = Σ_a P(B|A=a)P(A=a), and independence: A ⊥ B iff P(A,B) = P(A)P(B).
Chain rule for probability: the full joint distribution factors into a product of conditional probabilities. Bayesian networks exploit this by assuming each variable only depends on its parents.
Bayesian network structure
The fundamental factorization of a Bayesian network with n nodes is:
Bayesian network joint probability factorization. Each variable is conditionally independent of its non-descendants given its parents. This reduces exponential joint table complexity to linear in the number of edges.
For a node Xᵢ with k binary parents, the CPT has 2^k rows. For a node with no parents, the CPT is just P(Xᵢ). A Bayesian network with n binary variables with at most k parents each requires O(n · 2^k) parameters vs. O(2^n) for the full joint table.
| Structure | Independence encoded | Joint factorization |
|---|---|---|
| A → B → C (chain) | A ⊥ C | B | P(A,B,C) = P(A)·P(B|A)·P(C|B) |
| A ← B → C (fork/common cause) | A ⊥ C | B | P(A,B,C) = P(B)·P(A|B)·P(C|B) |
| A → C ← B (collider/v-structure) | A ⊥ B marginally; A ⊥̸ B | C | P(A,B,C) = P(A)·P(B)·P(C|A,B) |
The collider: explaining away
In the v-structure A → C ← B, A and B are marginally independent. But conditioning on C (the common effect) creates a dependency between A and B — this is "explaining away" (Berkson's paradox). Example: if an alarm rings (C), learning there was a burglary (A) makes an earthquake (B) less likely. GATE frequently tests this counterintuitive independence property.
D-separation and conditional independence
D-separation (direction-dependent separation) is the graphical criterion for reading conditional independence from the DAG structure. A path between X and Y is d-separated by a set of evidence nodes Z if any of the following hold for a node M on the path:
- Chain (X→M→Y or X←M←Y): path blocked if M ∈ Z (conditioning on the middle node blocks the path)
- Fork (X←M→Y): path blocked if M ∈ Z (conditioning on the common cause blocks the path)
- Collider (X→M←Y): path blocked if M ∉ Z AND no descendant of M ∈ Z (the collider OPENS the path when M is observed, closing it otherwise)
If ALL paths between X and Y are d-separated by Z, then X ⊥ Y | Z (conditional independence). The Markov blanket of a node consists of its parents, its children, and its children's other parents — given its Markov blanket, a node is conditionally independent of every other node.
Computing joint and conditional probabilities
To compute any probability from a Bayesian network: (1) Write the joint using P(X₁,…,Xₙ) = ∏P(Xᵢ|Parents(Xᵢ)). (2) Sum out (marginalize) unwanted variables. (3) For conditional P(X|E=e), compute P(X,E=e) and normalize by summing over X values.
Bayesian network joint probability calculation — Burglary-Alarm example
# Classic Burglary(B)-Earthquake(E)-Alarm(A)-JohnCalls(J)-MaryCalls(M) network
# P(J=T, M=T, A=T, B=T, E=F) =
# P(B=T) * P(E=F) * P(A=T|B=T,E=F) * P(J=T|A=T) * P(M=T|A=T)
p = 0.001 * 0.998 * 0.94 * 0.90 * 0.70
print(f"P(J,M,A,B,not-E) ≈ {p:.7f}") # ≈ 0.0005924
# P(B=T | J=T, M=T) requires summing over E and A:
# Numerator: sum_{e,a} P(B=T, e, a, J=T, M=T)
# Denominator: sum_{b,e,a} P(b, e, a, J=T, M=T)
# This is the variable elimination problem — see the VE entry.GATE PYQ Spotlight
GATE DA 2025 — Bayesian network joint probability (NAT, 2-mark)
A Bayesian network has four Bernoulli random variables U, V, W, Z with given CPTs. Compute P(U=1, V=1, W=1, Z=1). Approach: Write P(U,V,W,Z) = P(U|Parents(U)) · P(V|Parents(V)) · P(W|Parents(W)) · P(Z|Parents(Z)). Read CPT values and multiply. For a chain U→V→W→Z: P = P(U=1)·P(V=1|U=1)·P(W=1|V=1)·P(Z=1|W=1). This type of numerical calculation appears as an NAT question requiring 3 decimal place accuracy.
GATE DA — Correct statements about Bayesian networks (MSQ, 2-mark)
Which statements are correct about Bayesian networks? (A) A Bayesian network is a directed acyclic graph (B) Each node stores P(Xᵢ | Parents(Xᵢ)) (C) Variables with no common ancestor are always independent (D) The joint probability is the product of all CPT entries for a given assignment Answer: (A), (B), (D) are correct. (C) is false — independence depends on d-separation, not just absence of common ancestors. Colliders can create dependencies upon conditioning even without common ancestors.
Practice questions
- What is conditional independence and why is it the foundation of Bayesian networks? (Answer: X is conditionally independent of Y given Z if P(X|Y,Z) = P(X|Z). Knowing Z renders Y irrelevant for predicting X. Bayesian networks exploit conditional independence to factorise high-dimensional joint distributions. Without conditional independence, representing P(X₁,...,X_n) requires 2ⁿ-1 parameters. With a Bayesian network encoding conditional independence relationships, the same distribution can be represented with far fewer parameters — the product of each variable's conditional probability table given its parents.)
- What is d-separation in a Bayesian network and how does it determine independence? (Answer: D-separation: two variables X and Y are d-separated given evidence Z if all paths between X and Y are blocked by Z. Three path patterns: (1) Chain X→Z→Y: Z blocks when observed. (2) Fork X←Z→Y: Z blocks when observed (common cause removed). (3) Collider X→Z←Y: Z OPENS (not blocks) when observed — observing a collision activates dependence. D-separation determines all conditional independence relationships encoded by the network structure.)
- A Bayesian network has Rain→Sprinkler and Rain→WetGrass and Sprinkler→WetGrass. You observe WetGrass=True. What happens to the dependence between Rain and Sprinkler? (Answer: Explaining away (Berkson's paradox): Rain and Sprinkler are marginally independent (no direct path). When WetGrass is observed (a common child), they become DEPENDENT — observing WetGrass activates the Rain-WetGrass-Sprinkler collider path. If we know the grass is wet AND it's not raining, the sprinkler is probably on. Knowing the cause (Rain) makes the other cause (Sprinkler) less likely — they 'explain away' each other.)
- What is the difference between Bayesian network inference for queries vs learning parameters? (Answer: Inference: given a trained network with known CPTs, compute posterior probability of query variables given observed evidence. Algorithms: variable elimination, belief propagation, MCMC. Learning: estimate the CPT parameters from data. With complete data: maximum likelihood estimation (count frequencies). With missing data: EM algorithm (Expectation-Maximisation) iteratively estimates missing values and updates parameters. Structure learning (learning the graph structure) is computationally harder — NP-hard in general.)
- How does a Naive Bayes classifier relate to a Bayesian network? (Answer: Naive Bayes IS a Bayesian network with a specific structure: one class node C with directed edges to all feature nodes X₁,...,X_n, and no edges between features. The naive independence assumption (features are conditionally independent given the class) is exactly the d-separation structure of this network. The joint is: P(C, X₁,...,X_n) = P(C) Π P(Xᵢ|C). Any Naive Bayes model can be visualised and extended as a Bayesian network — adding dependencies between features when the naive assumption is violated.)