Normalisation organises a relational database to reduce data redundancy and prevent update anomalies. It applies a series of Normal Forms guided by Functional Dependencies (FDs). 1NF eliminates repeating groups. 2NF eliminates partial dependencies. 3NF eliminates transitive dependencies. BCNF is stricter than 3NF. Normalisation is the highest-weightage GATE DBMS topic — numerical questions on FDs and NF appear almost every year.
Real-life analogy: The messy spreadsheet
Consider: (StudentID, StudentName, CourseID, CourseName, Instructor, InstructorPhone). Problems: (1) Changing InstructorPhone requires updating every row with that instructor — one missed row creates inconsistency (update anomaly). (2) Deleting a student removes course info (delete anomaly). (3) Cannot store a course with no students (insert anomaly). Normalisation splits this into clean, anomaly-free tables.
Functional Dependencies and Armstrong axioms
FD X→Y: knowing the value of X uniquely determines Y. If two tuples have the same X values, they must have the same Y values. Example: StudentID→StudentName, CourseID→CourseName.
Armstrong axioms: (1) Reflexivity: Y⊆X implies X→Y. (2) Augmentation: X→Y implies XZ→YZ. (3) Transitivity: X→Y and Y→Z implies X→Z. Use these to compute X+ (closure of X — all attributes determined by X).
Computing attribute closure and finding candidate keys
from itertools import combinations
def closure(X: set, FDs: list) -> set:
result = set(X)
changed = True
while changed:
changed = False
for (lhs, rhs) in FDs:
if lhs.issubset(result) and not rhs.issubset(result):
result |= rhs
changed = True
return result
def candidate_keys(attrs: set, FDs: list) -> list:
keys = []
for size in range(1, len(attrs) + 1):
for combo in combinations(attrs, size):
s = set(combo)
if closure(s, FDs) == attrs:
if not any(set(k) < s for k in keys):
keys.append(frozenset(s))
return [set(k) for k in keys]
# Example: R(A,B,C,D,E), FDs: A→B, BC→D, D→E
attrs = {'A','B','C','D','E'}
FDs = [({'A'},{'B'}), ({'B','C'},{'D'}), ({'D'},{'E'})]
print("Closure of {A,C}:", closure({'A','C'}, FDs)) # {A,B,C,D,E}
print("Candidate keys:", candidate_keys(attrs, FDs)) # [{A,C}]Normal Forms — 1NF through BCNF
| Normal Form | Requirement | Violation | Fix |
|---|---|---|---|
| 1NF | All attributes atomic, no repeating groups | Hobbies = "Cricket,Chess" multi-valued | Separate table StudentHobbies(StudentID, Hobby) |
| 2NF | 1NF + no partial dependency (no non-key attr depends on part of composite PK) | R(SID,CID,SName,Grade). SName depends only on SID. | Split: Student(SID,SName), Enrollment(SID,CID,Grade) |
| 3NF | 2NF + no transitive dependency | Employee(EmpID,DeptID,DeptName). DeptID→DeptName. | Split: Employee(EmpID,DeptID), Department(DeptID,DeptName) |
| BCNF | For every non-trivial FD X→Y, X must be a superkey | R(Course,Teacher,Student). Teacher→Course, Teacher is not a superkey. | Split: TeacherCourse(Teacher,Course), Enrollment(Student,Teacher) |
BCNF vs 3NF trade-off — GATE favourite
BCNF eliminates all redundancy but may lose some FDs (lossless but not always dependency-preserving). 3NF always preserves all FDs but allows slight redundancy. GATE pattern: "Is this in BCNF? In 3NF? Decompose it." Check: for every FD X→Y, is X a superkey (BCNF)? Or is X a superkey OR Y is prime (3NF)?
Practice questions (GATE-style)
- R(A,B,C,D). FDs: A→B, B→C, C→D, D→A. Find all candidate keys. (Answer: A→B→C→D→A is a cycle. Each single attribute determines all others. Candidate keys: {A}, {B}, {C}, {D}.)
- R(SID,CID,InstructorID,InstructorPhone). PK=(SID,CID). FD: InstructorID→InstructorPhone. Which NF is violated? (Answer: 3NF violated — InstructorPhone transitively depends on PK via InstructorID (non-key→non-key).)
- FD X→Y where X is not a superkey. R is definitely NOT in: (Answer: BCNF — BCNF requires every non-trivial FD LHS to be a superkey.)
- What is a lossless join decomposition? (Answer: Decomposing R into R1, R2 such that R1 ⋈ R2 = R exactly. Guaranteed lossless if R1∩R2 → R1 or R1∩R2 → R2.)
- Relation R(A,B,C). FDs: A→B, B→C, C→A. Is R in BCNF? (Answer: Yes — candidate keys are {A},{B},{C}. Every FD has a candidate key on the LHS, which is a superkey.)
On LumiChats
LumiChats can check normal forms, find candidate keys from FDs, and decompose relations into BCNF or 3NF step-by-step. Paste your relation schema and FDs and ask: 'Is this in BCNF? If not, decompose it.'
Try it free