Glossary/Star Schema, Snowflake Schema & Multidimensional Modelling
Data Warehousing

Star Schema, Snowflake Schema & Multidimensional Modelling

Designing data warehouses for fast analytical queries with fact and dimension tables.


Definition

Dimensional modelling organises data warehouses into fact tables (measurable business events — sales, clicks, orders) and dimension tables (descriptive context — who, what, when, where). The Star Schema: one central fact table surrounded by denormalised dimension tables. The Snowflake Schema: normalises dimension tables into sub-dimensions. Fact Constellation: multiple fact tables sharing dimension tables. These schemas are the foundation of OLAP and business intelligence.

Real-life analogy: The sales receipt

Every receipt captures: WHO bought it (customer), WHAT they bought (product), WHEN (date), WHERE (store), HOW MUCH (revenue, quantity). The receipt is the fact. Customer, Product, Date, Store are dimensions. A fact table = millions of receipts; dimension tables = master data describing each entity on the receipt.

Star Schema — the standard dimensional model

Complete star schema for retail sales analytics

-- DIMENSION TABLES (descriptive context — wide, denormalised)
CREATE TABLE dim_date (
    DateKey    INT          PRIMARY KEY,   -- Surrogate key e.g. 20241225
    FullDate   DATE         NOT NULL,
    Day        SMALLINT,    Month     SMALLINT,
    MonthName  VARCHAR(10), Quarter   SMALLINT,
    Year       SMALLINT,    DayOfWeek SMALLINT,
    DayName    VARCHAR(10), IsWeekend BOOLEAN,
    IsHoliday  BOOLEAN,     FiscalYear SMALLINT
);

CREATE TABLE dim_product (
    ProductKey   INT          PRIMARY KEY,   -- Surrogate key (stable, compact)
    ProductID    VARCHAR(20),                -- Business/natural key
    ProductName  VARCHAR(200) NOT NULL,
    SubCategory  VARCHAR(100),
    Category     VARCHAR(100),
    Brand        VARCHAR(100),
    UnitCost     DECIMAL(10,2),
    IsActive     BOOLEAN      DEFAULT TRUE,
    -- SCD Type 2: track historical changes
    EffectiveFrom DATE,  EffectiveTo DATE,  IsCurrent BOOLEAN DEFAULT TRUE
);

CREATE TABLE dim_customer (
    CustomerKey INT          PRIMARY KEY,
    CustomerID  VARCHAR(20),
    Name        VARCHAR(100),
    City        VARCHAR(100), State    VARCHAR(50),
    Country     VARCHAR(50),  Region   VARCHAR(50),
    Segment     VARCHAR(50)   -- Premium, Standard, Economy
);

CREATE TABLE dim_store (
    StoreKey  INT         PRIMARY KEY,
    StoreID   VARCHAR(20),
    StoreName VARCHAR(100),
    StoreType VARCHAR(50), City   VARCHAR(100),
    State     VARCHAR(50), Region VARCHAR(50)
);

-- FACT TABLE: measurable events (FKs to all dimensions + numeric measures)
CREATE TABLE fact_sales (
    SalesID     BIGSERIAL     PRIMARY KEY,
    DateKey     INT NOT NULL  REFERENCES dim_date(DateKey),
    ProductKey  INT NOT NULL  REFERENCES dim_product(ProductKey),
    CustomerKey INT NOT NULL  REFERENCES dim_customer(CustomerKey),
    StoreKey    INT NOT NULL  REFERENCES dim_store(StoreKey),
    -- MEASURES (the numbers you analyse)
    UnitsSold   INT           NOT NULL,
    Revenue     DECIMAL(12,2) NOT NULL,
    Cost        DECIMAL(12,2) NOT NULL,
    Profit      DECIMAL(12,2) GENERATED ALWAYS AS (Revenue - Cost) STORED,
    DiscountAmt DECIMAL(10,2) DEFAULT 0
);

-- Star schema query: revenue by category and quarter
SELECT
    p.Category, d.Year, d.Quarter,
    SUM(f.Revenue) AS Revenue,
    SUM(f.Profit)  AS Profit,
    SUM(f.UnitsSold) AS Units
FROM fact_sales f
JOIN dim_product p ON f.ProductKey = p.ProductKey
JOIN dim_date    d ON f.DateKey    = d.DateKey
WHERE p.IsActive = TRUE
GROUP BY p.Category, d.Year, d.Quarter
ORDER BY d.Year, d.Quarter, Revenue DESC;

Snowflake Schema vs Star Schema

PropertyStar SchemaSnowflake Schema
Dimension tablesDenormalised (flat, wide)Normalised (split into sub-tables)
StorageMore (redundancy in dims)Less (normalised)
Query performanceFaster (fewer JOINs)Slightly slower (more JOINs)
ETL complexitySimpler to loadMore complex transformation
Recommended forMost OLAP use casesVery large dimensions where storage matters

Snowflake schema — normalised product dimension

-- SNOWFLAKE: dim_product split into sub-dimensions
CREATE TABLE dim_category    (CategoryKey INT PK, CategoryName VARCHAR(100), Department VARCHAR(100));
CREATE TABLE dim_subcategory (SubcatKey INT PK, SubcatName VARCHAR(100), CategoryKey INT REFERENCES dim_category);
CREATE TABLE dim_brand       (BrandKey INT PK, BrandName VARCHAR(100), Country VARCHAR(50));

-- Product dimension references sub-tables (normalised)
CREATE TABLE dim_product_sf (
    ProductKey  INT PRIMARY KEY,
    ProductName VARCHAR(200),
    BrandKey    INT REFERENCES dim_brand(BrandKey),
    SubcatKey   INT REFERENCES dim_subcategory(SubcatKey)
);

-- Snowflake query needs more JOINs
SELECT cat.CategoryName, b.BrandName, SUM(f.Revenue)
FROM fact_sales f
JOIN dim_product_sf p  ON f.ProductKey = p.ProductKey
JOIN dim_subcategory s ON p.SubcatKey  = s.SubcatKey
JOIN dim_category cat  ON s.CategoryKey = cat.CategoryKey
JOIN dim_brand b       ON p.BrandKey   = b.BrandKey
GROUP BY cat.CategoryName, b.BrandName;

SCD — Slowly Changing Dimensions

Dimension data changes over time (customer address, product price, employee department). SCD types: Type 1: Overwrite (no history). Type 2: Add new row with new surrogate key, set EffectiveTo on old row — preserves full history. Type 3: Add Previous_Value column — limited history. SCD Type 2 is most common — preserves dimension state at time each fact was recorded.

Practice questions

  1. Difference between fact table and dimension table? (Answer: Fact: measurable numeric events (revenue, quantity). Dimension: descriptive context (name, category, date attributes). Facts are long (millions of rows); dimensions are wide (many descriptive columns, fewer rows).)
  2. Why do dimension tables use surrogate keys instead of natural/business keys? (Answer: Natural keys can change (email changes). Surrogate keys are stable, compact, fast to JOIN, and support SCD Type 2 — add new row for same entity while preserving history.)
  3. What is a degenerate dimension? (Answer: A dimension attribute stored directly in the fact table with no corresponding dimension table. Examples: OrderNumber, InvoiceID, TransactionID. No descriptive attributes — just the key itself.)
  4. Star schema has 4 dimension tables connected to 1 fact table. This is different from snowflake schema because: (Answer: In star schema, dimensions are directly connected (denormalised, flat). In snowflake, dimensions reference sub-dimension tables (normalised hierarchy).)
  5. Why are fact tables long (many rows) and dimension tables wide (many columns)? (Answer: Facts record every business event (every sale, click) — millions per day. Dimensions describe entities (10,000 products, 1M customers) — fewer rows but many descriptive attributes for rich filtering.)

On LumiChats

LumiChats can design a complete star schema for any business domain, generate the full DDL for fact and dimension tables, and write analytical SQL queries. Describe your business process and LumiChats builds the data model.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms