Glossary/SQL Practical Patterns — Top-N, Deduplication, Running Totals & Gaps
SQL & Databases

SQL Practical Patterns — Top-N, Deduplication, Running Totals & Gaps

Real-world SQL patterns every developer and analyst uses daily.


Definition

Beyond basic SELECT, certain SQL patterns appear constantly in production systems: finding the top-N records per group, deduplicating data with duplicates, computing running totals and cumulative sums, finding gaps in sequential data, generating date series, and calculating percentiles. These patterns combine window functions, CTEs, self-joins, and set operations in creative ways. Mastering these patterns separates intermediate SQL users from expert practitioners.

Top-N queries — getting the best per group

Top-N queries using ROW_NUMBER, RANK, and LATERAL

-- Pattern 1: Top-1 per group (most common)
-- Find the highest-paid employee in each department

-- Using ROW_NUMBER (returns exactly 1, even for ties)
WITH RankedEmployees AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY DeptID ORDER BY Salary DESC) AS rn
    FROM Employee
)
SELECT EmpID, Name, DeptID, Salary FROM RankedEmployees WHERE rn = 1;

-- Using RANK (returns all tied for 1st)
WITH RankedEmployees AS (
    SELECT *, RANK() OVER (PARTITION BY DeptID ORDER BY Salary DESC) AS rnk
    FROM Employee
)
SELECT EmpID, Name, DeptID, Salary FROM RankedEmployees WHERE rnk = 1;

-- Top-3 per department
WITH RankedEmployees AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY DeptID ORDER BY Salary DESC) AS rn
    FROM Employee
)
SELECT EmpID, Name, DeptID, Salary, rn AS Rank
FROM RankedEmployees WHERE rn <= 3
ORDER BY DeptID, rn;

-- Alternative: LATERAL JOIN (PostgreSQL) — often faster
SELECT d.DeptName, e.Name, e.Salary
FROM Department d
CROSS JOIN LATERAL (
    SELECT Name, Salary FROM Employee
    WHERE DeptID = d.DeptID
    ORDER BY Salary DESC
    LIMIT 3
) e;

-- Nth row (e.g., 2nd highest salary)
SELECT DISTINCT Salary FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1;   -- Skip 1, take 1 → 2nd highest

Deduplication — finding and removing duplicates

Finding, counting, and removing duplicate rows

-- Find all duplicate emails in Customer table
SELECT Email, COUNT(*) AS Occurrences
FROM Customer
GROUP BY Email
HAVING COUNT(*) > 1
ORDER BY Occurrences DESC;

-- Show all rows that have duplicates (with full row data)
SELECT c.*
FROM Customer c
JOIN (
    SELECT Email FROM Customer GROUP BY Email HAVING COUNT(*) > 1
) dupes ON c.Email = dupes.Email
ORDER BY c.Email;

-- IDENTIFY duplicates using ROW_NUMBER (keep the first, mark others)
WITH DedupCTE AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY Email ORDER BY CustomerID ASC) AS rn
           -- Partition by duplicate key, keep lowest ID as original
    FROM Customer
)
SELECT * FROM DedupCTE WHERE rn > 1;   -- These are the duplicates

-- DELETE duplicates — keep the one with the lowest CustomerID
WITH DedupCTE AS (
    SELECT CustomerID,
           ROW_NUMBER() OVER (PARTITION BY Email ORDER BY CustomerID ASC) AS rn
    FROM Customer
)
DELETE FROM Customer
WHERE CustomerID IN (SELECT CustomerID FROM DedupCTE WHERE rn > 1);

-- Bulk deduplicate into a new table (safer for large tables)
CREATE TABLE Customer_Clean AS
SELECT DISTINCT ON (Email) *        -- PostgreSQL: keep one row per Email
FROM Customer
ORDER BY Email, CustomerID;         -- Keep the row with lowest CustomerID

-- DISTINCT vs GROUP BY for deduplication:
SELECT DISTINCT Name, Email FROM Customer;  -- Remove fully duplicate rows
SELECT Name, Email FROM Customer GROUP BY Name, Email;  -- Equivalent result

Running totals, cumulative sums, and moving averages

Cumulative calculations with window functions

-- Running total (cumulative sum)
SELECT
    OrderDate,
    Amount,
    SUM(Amount) OVER (ORDER BY OrderDate
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM Orders WHERE CustomerID = 101
ORDER BY OrderDate;

-- Cumulative % of total
SELECT
    Category,
    Revenue,
    ROUND(100.0 * SUM(Revenue) OVER (ORDER BY Revenue DESC
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          / SUM(Revenue) OVER (), 1)           AS CumulativePct
FROM CategoryRevenue
ORDER BY Revenue DESC;

-- 7-day moving average
SELECT
    OrderDate,
    DailyRevenue,
    AVG(DailyRevenue) OVER (ORDER BY OrderDate
                            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MA7,
    AVG(DailyRevenue) OVER (ORDER BY OrderDate
                            ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS MA30
FROM DailyRevenue
ORDER BY OrderDate;

-- Period-over-period growth (month over month)
SELECT
    Month,
    Revenue,
    LAG(Revenue) OVER (ORDER BY Month) AS PrevMonthRevenue,
    ROUND(100.0 * (Revenue - LAG(Revenue) OVER (ORDER BY Month))
          / LAG(Revenue) OVER (ORDER BY Month), 1) AS MoMGrowthPct
FROM MonthlyRevenue
ORDER BY Month;

-- YTD (Year-to-Date) cumulative revenue, resetting each year
SELECT
    OrderDate,
    EXTRACT(YEAR FROM OrderDate)  AS Year,
    EXTRACT(MONTH FROM OrderDate) AS Month,
    Revenue,
    SUM(Revenue) OVER (
        PARTITION BY EXTRACT(YEAR FROM OrderDate)  -- Reset at year start
        ORDER BY OrderDate
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS YTD_Revenue
FROM MonthlyRevenue
ORDER BY OrderDate;

Gaps and islands — finding missing sequences

Gaps: missing IDs/dates; Islands: consecutive sequences

-- GAP DETECTION: find missing order IDs in sequence 1..MAX
-- Expected: 1,2,3,4,5... Actual: 1,2,4,5,7 → gaps at 3 and 6

-- Method 1: Generate series and find missing
SELECT generate_series(1, MAX(OrderID)) AS MissingID  -- PostgreSQL
FROM Orders
EXCEPT
SELECT OrderID FROM Orders;
-- Returns: 3, 6 (the missing IDs)

-- Method 2: Self-join to find gaps
SELECT o1.OrderID + 1 AS GapStart,
       MIN(o2.OrderID) - 1 AS GapEnd
FROM Orders o1
JOIN Orders o2 ON o2.OrderID > o1.OrderID
WHERE NOT EXISTS (SELECT 1 FROM Orders o3 WHERE o3.OrderID = o1.OrderID + 1)
GROUP BY o1.OrderID
HAVING o1.OrderID + 1 <= MIN(o2.OrderID) - 1;

-- ISLAND DETECTION: find consecutive groups
-- Example: Employee sessions — find each continuous logged-in period
-- Sessions: days 1,2,3, then gap, then 7,8,9 = two islands

WITH DateGroups AS (
    SELECT LoginDate,
           LoginDate - INTERVAL '1 day' * ROW_NUMBER()
                       OVER (PARTITION BY EmpID ORDER BY LoginDate) AS GroupKey
    FROM LoginHistory WHERE EmpID = 101
)
SELECT
    MIN(LoginDate) AS PeriodStart,
    MAX(LoginDate) AS PeriodEnd,
    COUNT(*)       AS ConsecutiveDays
FROM DateGroups
GROUP BY GroupKey
ORDER BY PeriodStart;

-- Date series generation (fill in missing dates for reporting)
-- Generate every day in 2024 even if no orders that day
SELECT
    d.date AS OrderDate,
    COALESCE(SUM(o.Amount), 0) AS DailyRevenue
FROM generate_series('2024-01-01'::DATE, '2024-12-31'::DATE, '1 day') AS d(date)
LEFT JOIN Orders o ON o.OrderDate = d.date
GROUP BY d.date
ORDER BY d.date;

SQL interview patterns to memorise

The most common SQL interview questions are exactly these patterns: (1) Top-N per group with ROW_NUMBER + CTE. (2) Delete duplicates keeping one (ROW_NUMBER + DELETE WHERE rn > 1). (3) Running totals with SUM OVER (ORDER BY ... ROWS UNBOUNDED PRECEDING). (4) Period-over-period growth with LAG(). (5) Find missing records with NOT EXISTS or EXCEPT. Practice these 5 and you will ace most SQL interviews.

Practice questions

  1. Find the 3rd highest salary without using LIMIT/OFFSET or TOP. (Answer: SELECT MIN(Salary) FROM (SELECT DISTINCT TOP 3 Salary FROM Employee ORDER BY Salary DESC) sub. Or: SELECT Salary FROM Employee e1 WHERE 2 = (SELECT COUNT(DISTINCT Salary) FROM Employee e2 WHERE e2.Salary > e1.Salary).)
  2. You have a Customer table with duplicate emails. Write SQL to delete duplicates keeping the oldest record (lowest CustomerID). (Answer: DELETE FROM Customer WHERE CustomerID NOT IN (SELECT MIN(CustomerID) FROM Customer GROUP BY Email))
  3. What window frame clause would give you a 4-week moving average? (Answer: AVG(value) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) — 28 rows total (current + 27 preceding = 4 weeks of daily data).)
  4. Explain the "Islands" problem with an example. (Answer: Given a log of daily user logins, find continuous login streaks. Technique: subtract ROW_NUMBER from the date — consecutive dates produce the same GroupKey. GROUP BY GroupKey gives each consecutive streak as one group with MIN/MAX dates.)
  5. Generate a report showing revenue for every day of 2024, including days with zero orders. How do you include the zero-revenue days? (Answer: Use generate_series() or a calendar table LEFT JOINed to orders: days with no orders appear with NULL which COALESCE(SUM(Amount), 0) converts to 0.)

On LumiChats

These patterns are used in production systems daily. LumiChats can solve any variant of these patterns: 'Find the top 5 products by revenue for each region, excluding products with less than 100 sales' — generates the complete window function query instantly.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms