Glossary/File Organisation & Indexing — B+ Trees and Hashing
Database Management Systems

File Organisation & Indexing — B+ Trees and Hashing

How databases find your data in milliseconds across millions of rows.


Definition

Indexing lets a database find rows without scanning every row. Without an index, finding a row in 1 million records requires up to 1 million comparisons. With a B+ tree index, the same lookup takes ~20 comparisons. B+ trees are the dominant index structure in PostgreSQL, MySQL InnoDB, and Oracle. Hash indexes support only equality lookups but are O(1). GATE tests B+ tree height, fan-out calculations, and index type selection.

Real-life analogy: The textbook index

Finding 'normalization' in a 900-page textbook: (1) No index = read every page = 900 reads. (2) With index: look up N, find normalization → page 423. Two steps. A database index works identically: small, sorted structure mapping search key values to disk block addresses.

B+ Tree structure and height

B+ tree height. N = leaf entries. m = order. For m=100, 1 million records: height = ceil(log₅₀(1,000,000)) = ceil(3.6) = 4. Only 4 disk reads to find any record!

B+ tree height and index type analysis

import math

def bplus_height(n_records: int, order: int) -> int:
    """Max height of B+ tree for n records with given order."""
    min_fanout = math.ceil(order / 2)
    if n_records <= order - 1:
        return 1
    return math.ceil(math.log(n_records, min_fanout))

# B+ tree examples
print(f"1M records, order 100:  height = {bplus_height(1_000_000, 100)}")    # 4
print(f"1B records, order 100:  height = {bplus_height(1_000_000_000, 100)}") # 5
print(f"10K records, order 10:  height = {bplus_height(10_000, 10)}")         # 5

# B+ tree vs sequential scan comparison
n = 1_000_000
order = 100
height = bplus_height(n, order)
print(f"
For {n:,} records with B+ tree order {order}:")
print(f"  Sequential scan: up to {n:,} block reads")
print(f"  B+ tree lookup:  {height} disk reads (height = {height})")
print(f"  Speedup: {n//height:,}x")

Index types

Index typeDescriptionBest for
Primary indexOn ordered key field — data physically sorted by thisRange queries on PK
Clustering indexOn non-key field, records physically ordered by that fieldRange queries on non-PK field
Secondary indexOn non-ordering field — records NOT sorted by this fieldEquality lookup on non-PK
Dense indexOne entry per recordFast lookup, more space
Sparse indexOne entry per disk blockLess space, only works with ordered data
Hash indexHash function maps key to bucketExact equality O(1), no range queries

GATE: Clustered vs non-clustered index

Clustered index physically orders table data — only ONE per table. Non-clustered index is a separate structure with pointers — multiple allowed. Primary key is automatically clustered in MySQL InnoDB. PostgreSQL: use CLUSTER command. B+ trees support both; hash indexes are always non-clustered.

Practice questions (GATE-style)

  1. B+ tree of order 5 with 500 records. What is the maximum height? (Answer: min fanout = ceil(5/2) = 3. ceil(log₃(500)) = ceil(5.66) = 6.)
  2. Why does B+ tree support range queries better than a B-tree? (Answer: B+ tree leaf nodes form a linked list — traverse leaves sequentially after finding start key. B-tree stores data in internal nodes too, requiring full tree traversal for range queries.)
  3. Hash index on Salary supports which query efficiently? (Answer: Only equality: WHERE Salary = 50000. Cannot support WHERE Salary > 50000 — hash scatters nearby values to different buckets.)
  4. Difference between dense and sparse index? (Answer: Dense: one entry per record — fast lookup, more space. Sparse: one entry per disk block — less space, only works on ordered data.)
  5. Relation has 10,000 records, B+ tree order = 50. What is the height? (Answer: min fanout = 25. ceil(log₂₅(10000)) = ceil(2.86) = 3. Just 3 disk accesses!)

On LumiChats

When LumiChats helps optimise slow SQL queries, the first suggestion is often 'add an index on the WHERE clause column.' Understanding B+ trees explains WHY: instead of scanning all rows, the DB traverses a 3-4 level tree to find your data instantly.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms