Glossary/Modern SQL — JSON, Full-Text Search, PIVOT & Recursive Queries
SQL & Databases

Modern SQL — JSON, Full-Text Search, PIVOT & Recursive Queries

SQL in 2024 — storing JSON, searching text, and querying hierarchical data.


Definition

Modern SQL has evolved far beyond tabular data. PostgreSQL, MySQL 8+, and SQL Server now support native JSON storage and querying, full-text search with relevance ranking, array data types, PIVOT/UNPIVOT for cross-tab reports, and recursive CTEs for hierarchical data (org charts, product categories, social graphs). These features let you handle document-style data, semi-structured API responses, and graph-like relationships directly in SQL without switching to a NoSQL database.

JSON storage and querying

JSON columns — store, query, index, and update

-- Store JSON data in a column
CREATE TABLE Product (
    ProductID   INT     PRIMARY KEY,
    Name        VARCHAR(200) NOT NULL,
    Attributes  JSONB   NOT NULL,  -- JSONB: binary JSON (indexed, faster queries)
    Tags        TEXT[]  DEFAULT '{}' -- Array of text
);

-- Insert JSON data
INSERT INTO Product VALUES (
    1, 'Laptop Pro 16',
    '{
        "brand": "TechCorp",
        "specs": {"ram": 32, "storage": "1TB", "cpu": "M3 Pro"},
        "colors": ["Silver", "Space Grey"],
        "warranty": {"years": 3, "international": true}
    }',
    ARRAY['electronics', 'laptop', 'premium']
);

-- Query JSON fields (PostgreSQL JSONB operators)
SELECT
    Name,
    Attributes->>'brand'                        AS Brand,           -- Text extraction
    Attributes->'specs'->>'ram'                 AS RAMgb,           -- Nested
    Attributes->'specs'->'storage'              AS StorageJSON,     -- JSON type
    Attributes->'specs'->>'storage'             AS StorageText,     -- Text type
    Attributes->'warranty'->>'international'    AS HasIntlWarranty
FROM Product
WHERE Attributes->>'brand' = 'TechCorp'
  AND (Attributes->'specs'->>'ram')::INT > 16
  AND Attributes @> '{"warranty": {"international": true}}';  -- JSON containment

-- JSON array operations
SELECT Name FROM Product WHERE Tags @> ARRAY['laptop'];       -- Array contains
SELECT Name FROM Product WHERE 'premium' = ANY(Tags);         -- Any element matches
SELECT Name, ARRAY_LENGTH(Tags, 1) AS TagCount FROM Product;

-- GIN index on JSONB for fast JSON queries
CREATE INDEX idx_product_attrs ON Product USING GIN(Attributes);
CREATE INDEX idx_product_tags  ON Product USING GIN(Tags);

-- Update specific JSON field (PostgreSQL)
UPDATE Product
SET Attributes = Attributes ||
    '{"warranty": {"years": 5, "international": true}}'::JSONB
WHERE ProductID = 1;

UPDATE Product
SET Attributes = jsonb_set(Attributes, '{specs, ram}', '64')
WHERE ProductID = 1;

-- Expand JSON array to rows
SELECT ProductID, jsonb_array_elements_text(Attributes->'colors') AS Color
FROM Product;

-- Aggregate multiple rows into JSON (build JSON responses)
SELECT json_agg(
    json_build_object('id', EmpID, 'name', Name, 'salary', Salary)
    ORDER BY Salary DESC
) AS employees_json
FROM Employee WHERE DeptID = 1;

Full-text search

Full-text search — better than LIKE for text content

-- PostgreSQL full-text search
-- tsvector: preprocessed searchable document (tokenised, stemmed)
-- tsquery: search query

-- Create a full-text search index
CREATE INDEX idx_article_fts ON Article
USING GIN (to_tsvector('english', title || ' ' || content));

-- Basic full-text search
SELECT title, content
FROM Article
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'machine & learning');

-- Search operators:
-- to_tsquery('cat & dog')        -- Both words
-- to_tsquery('cat | dog')        -- Either word
-- to_tsquery('!dog')             -- Not dog
-- to_tsquery('supercal:*')       -- Prefix match
-- phraseto_tsquery('machine learning') -- Exact phrase (word order matters)
-- websearch_to_tsquery('machine learning -deep') -- Natural language

-- Rank results by relevance
SELECT
    title,
    ts_rank(to_tsvector('english', content), to_tsquery('database')) AS Rank
FROM Article
WHERE to_tsvector('english', content) @@ to_tsquery('database')
ORDER BY Rank DESC
LIMIT 10;

-- Highlight matching terms
SELECT
    title,
    ts_headline('english', content, to_tsquery('database'),
                'StartSel=<b>,StopSel=</b>,MaxWords=30') AS Snippet
FROM Article
WHERE to_tsvector('english', content) @@ to_tsquery('database');

-- MySQL full-text search
CREATE FULLTEXT INDEX idx_article_ft ON Article(title, content);
SELECT *, MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) AS score
FROM Article
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC;

PIVOT and UNPIVOT — cross-tab reports

PIVOT: rows to columns, UNPIVOT: columns to rows

-- PIVOT: transform row values into column headers
-- Without PIVOT: each month is a row
-- With PIVOT: each month becomes a column

-- Method 1: CASE-based PIVOT (works in all DBMS)
SELECT
    Region,
    SUM(CASE WHEN Month = 1 THEN Revenue ELSE 0 END) AS Jan,
    SUM(CASE WHEN Month = 2 THEN Revenue ELSE 0 END) AS Feb,
    SUM(CASE WHEN Month = 3 THEN Revenue ELSE 0 END) AS Mar,
    SUM(CASE WHEN Month = 4 THEN Revenue ELSE 0 END) AS Apr,
    SUM(CASE WHEN Month = 5 THEN Revenue ELSE 0 END) AS May,
    SUM(CASE WHEN Month = 6 THEN Revenue ELSE 0 END) AS Jun,
    SUM(Revenue) AS Annual
FROM MonthlySales
WHERE Year = 2024
GROUP BY Region
ORDER BY Annual DESC;

-- Method 2: SQL Server PIVOT operator (native)
SELECT Region, [1] AS Jan, [2] AS Feb, [3] AS Mar
FROM MonthlySales
PIVOT (
    SUM(Revenue)
    FOR Month IN ([1], [2], [3])
) AS PivotTable;

-- UNPIVOT: columns → rows (reverse of PIVOT)
-- Wide format table: Jan, Feb, Mar columns
-- → Narrow format: one row per month

-- Method 1: UNION ALL UNPIVOT (any DBMS)
SELECT Region, 'Jan' AS Month, Jan AS Revenue FROM RegionMonthly
UNION ALL SELECT Region, 'Feb', Feb FROM RegionMonthly
UNION ALL SELECT Region, 'Mar', Mar FROM RegionMonthly;

-- Method 2: SQL Server UNPIVOT operator
SELECT Region, Month, Revenue
FROM RegionMonthly
UNPIVOT (Revenue FOR Month IN (Jan, Feb, Mar)) AS Unpivoted;

-- PostgreSQL: crosstab() function (requires tablefunc extension)
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM crosstab(
    'SELECT Region, Month, SUM(Revenue) FROM MonthlySales GROUP BY Region, Month ORDER BY 1,2',
    'SELECT DISTINCT Month FROM MonthlySales ORDER BY 1'
) AS ct(Region TEXT, "Jan" NUMERIC, "Feb" NUMERIC, "Mar" NUMERIC);

Recursive CTEs for hierarchical data

Recursive CTEs for org charts, categories, and paths

-- Recursive CTE structure:
-- WITH RECURSIVE cte_name AS (
--     base_case_query       -- Non-recursive: starting point
--     UNION ALL
--     recursive_case_query  -- References cte_name: each iteration
-- )
-- SELECT * FROM cte_name;   -- IMPORTANT: must have termination condition!

-- Example 1: Org chart — find all reports under a manager
WITH RECURSIVE OrgChart AS (
    -- Base case: the starting manager (e.g., CEO)
    SELECT EmpID, Name, ManagerID, 0 AS Level, ARRAY[EmpID] AS Path
    FROM Employee
    WHERE ManagerID IS NULL                    -- Root node(s)

    UNION ALL

    -- Recursive case: employees whose manager is in current result
    SELECT e.EmpID, e.Name, e.ManagerID,
           oc.Level + 1,
           oc.Path || e.EmpID                  -- Track the path
    FROM Employee e
    JOIN OrgChart oc ON e.ManagerID = oc.EmpID -- Join to previous level
    WHERE NOT e.EmpID = ANY(oc.Path)           -- Cycle detection!
)
SELECT
    REPEAT('  ', Level) || Name AS OrgName,    -- Indent by level
    Level,
    Path
FROM OrgChart
ORDER BY Path;

-- Example 2: Category tree with full path
WITH RECURSIVE CategoryPath AS (
    SELECT CatID, Name, ParentID, Name::TEXT AS FullPath, 1 AS Depth
    FROM Category WHERE ParentID IS NULL    -- Top-level categories

    UNION ALL

    SELECT c.CatID, c.Name, c.ParentID,
           cp.FullPath || ' > ' || c.Name,  -- Build breadcrumb path
           cp.Depth + 1
    FROM Category c JOIN CategoryPath cp ON c.ParentID = cp.CatID
)
SELECT CatID, FullPath, Depth FROM CategoryPath ORDER BY FullPath;

-- Example 3: Shortest path in a graph (weighted edges)
WITH RECURSIVE Paths AS (
    SELECT FromNode, ToNode, Weight, ARRAY[FromNode, ToNode] AS Visited
    FROM Edges WHERE FromNode = 'A'

    UNION ALL

    SELECT p.FromNode, e.ToNode, p.Weight + e.Weight, p.Visited || e.ToNode
    FROM Paths p JOIN Edges e ON p.ToNode = e.FromNode
    WHERE NOT e.ToNode = ANY(p.Visited)    -- No cycles
)
SELECT * FROM Paths WHERE ToNode = 'Z'
ORDER BY Weight LIMIT 1;   -- Shortest path from A to Z

Practice questions

  1. PostgreSQL JSONB vs JSON — which should you use and why? (Answer: Always prefer JSONB. JSON stores the raw text (preserves whitespace, key order). JSONB stores a parsed binary representation — faster for queries and comparisons, supports GIN indexing, but slightly slower to INSERT. The only reason to use JSON is when you need to preserve exact key order or insignificant whitespace.)
  2. Why is full-text search better than LIKE '%keyword%' for large text columns? (Answer: LIKE '%keyword%' requires a full sequential scan (no index possible). Full-text search uses a pre-built inverted index (GIN) that maps words to row IDs — O(log n) lookup. Also supports stemming ('run' matches 'running'), relevance ranking, and phrase matching — impossible with LIKE.)
  3. What problem does a recursive CTE solve that a regular CTE cannot? (Answer: Hierarchical/graph data where the depth is unknown. Regular SQL cannot express "traverse all levels of an org chart" because the depth varies. Recursive CTEs iterate until no new rows are produced — handling arbitrary-depth hierarchies.)
  4. In a recursive CTE, what prevents an infinite loop? (Answer: The termination condition in the recursive query. Two approaches: (1) Natural termination — when the JOIN finds no more matching rows. (2) Explicit cycle detection — NOT e.EmpID = ANY(visited_path) prevents revisiting nodes. Always include cycle detection for graph data where cycles are possible.)
  5. PIVOT converts rows to columns. When would you use UNPIVOT? (Answer: When data is stored in wide format (one column per attribute) but you need narrow/tall format for analysis. Example: a table with Jan, Feb, Mar as columns → UNPIVOT creates rows Month, Revenue for easier GROUP BY, filtering, and time-series analysis.)

On LumiChats

LumiChats can write JSON querying expressions for PostgreSQL JSONB, build recursive org chart queries, generate PIVOT reports, and implement full-text search with ranking and highlighting. Describe your data structure and analytical goal.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

3 terms