5 SQL Patterns That Separate Junior Analysts from Senior Ones

The Query That Got Me Hired

During my interview at AlCircle, they gave me a take-home SQL test. The dataset had 2.3 million rows of aluminium transaction data. The question: "Find the top 3 products by revenue growth rate for each region, quarter over quarter."

Most candidates wrote nested subqueries with GROUP BY and HAVING. It worked, but it took 45 seconds to run and was nearly impossible to debug.

I wrote it with window functions and CTEs. It ran in 1.2 seconds and was 20 lines of readable SQL.

Same answer. Different approach. They offered me the position the next day.

The difference between junior and senior SQL isn't knowing more syntax. It's knowing which pattern solves the problem cleanly. Here are the five patterns I use every single day.

Pattern 1: Window Functions for Running Totals and Rankings

If you're still using self-joins to calculate running totals, you're writing 2010 SQL.

-- Running total of monthly revenue
SELECT
    month,
    revenue,
    SUM(revenue) OVER (ORDER BY month) AS running_total,
    revenue - LAG(revenue) OVER (ORDER BY month) AS month_over_month_change
FROM monthly_sales;

The LAG() and LEAD() functions let you reference adjacent rows without joining the table to itself. I use this daily for trend analysis — month-over-month changes, week-over-week comparisons, and gap analysis.

For rankings, RANK() and ROW_NUMBER() replace the old "find the max, join back to get the row" pattern:

-- Top 3 products per region
SELECT * FROM (
    SELECT
        region,
        product,
        revenue,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS rank
    FROM sales
) ranked
WHERE rank <= 3;

Pattern 2: CTEs Over Subqueries

Common Table Expressions (CTEs) don't change what your query does. They change whether someone can read it.

-- Without CTEs (hard to read)
SELECT * FROM (
    SELECT region, AVG(revenue) as avg_rev
    FROM sales
    WHERE year = 2025
    GROUP BY region
) sub
WHERE avg_rev > (
    SELECT AVG(revenue) FROM sales WHERE year = 2025
);

-- With CTEs (readable)
WITH regional_avg AS (
    SELECT region, AVG(revenue) AS avg_rev
    FROM sales
    WHERE year = 2025
    GROUP BY region
),
global_avg AS (
    SELECT AVG(revenue) AS global_rev
    FROM sales
    WHERE year = 2025
)
SELECT r.region, r.avg_rev, g.global_rev
FROM regional_avg r
CROSS JOIN global_avg g
WHERE r.avg_rev > g.global_rev;

Same logic. But the CTE version names each step. When your colleague opens this query in six months, they'll understand it in 30 seconds instead of spending 10 minutes untangling nested subqueries.

Rule of thumb: If your subquery is more than 3 lines, make it a CTE. If you reference the same logic twice, make it a CTE.

Pattern 3: Conditional Aggregation

This one is subtle but powerful. Instead of filtering data and then aggregating, aggregate first and filter inside the aggregation:

-- How many new vs returning customers per month?
SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(DISTINCT customer_id) AS total_customers,
    COUNT(DISTINCT CASE WHEN is_first_order THEN customer_id END) AS new_customers,
    COUNT(DISTINCT CASE WHEN NOT is_first_order THEN customer_id END) AS returning_customers
FROM orders
GROUP BY 1;

This single scan produces what would otherwise require three separate queries or a complex UNION. I use this pattern constantly for dashboard metrics where you need multiple aggregations over the same dataset.

Pattern 4: EXISTS Over IN for Large Datasets

-- Slow with large subquery results
SELECT * FROM products
WHERE product_id IN (SELECT product_id FROM orders WHERE year = 2025);

-- Fast — stops scanning at first match
SELECT * FROM products p
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.product_id = p.product_id AND o.year = 2025
);

IN loads all matching values into memory, then checks membership. EXISTS stops at the first match. For a subquery returning 100K+ rows, the performance difference is dramatic.

12x

Switching from IN to EXISTS on a product-order join reduced query time from 14.2 seconds to 1.1 seconds. The orders table had 2.3M rows; the IN version was loading all matching product_ids before filtering.

Source: Internal query benchmarking at AlCircle

Pattern 5: DATE Arithmetic for Period Comparisons

Every analyst needs to compare "this period vs last period." The naive approach is hardcoding dates. The pro approach is dynamic date arithmetic:

-- This week vs same week last year
WITH current_week AS (
    SELECT *
    FROM sales
    WHERE order_date >= DATE_TRUNC('week', CURRENT_DATE)
      AND order_date < DATE_TRUNC('week', CURRENT_DATE) + INTERVAL '7 days'
),
prior_year_week AS (
    SELECT *
    FROM sales
    WHERE order_date >= DATE_TRUNC('week', CURRENT_DATE - INTERVAL '1 year')
      AND order_date < DATE_TRUNC('week', CURRENT_DATE - INTERVAL '1 year') + INTERVAL '7 days'
)
SELECT
    'Current' AS period, SUM(revenue) AS total_revenue FROM current_week
UNION ALL
SELECT
    'Prior Year' AS period, SUM(revenue) AS total_revenue FROM prior_year_week;

This pattern works for any period — week, month, quarter, year. And because the dates are calculated dynamically, your report never goes stale.

The Meta-Skill

None of these patterns are advanced SQL. They're all available in PostgreSQL, SQL Server, MySQL, and BigQuery. The barrier isn't knowledge — it's recognition.

Junior analysts reach for subqueries because that's what they learned first. Senior analysts assess the problem, pick the pattern that fits, and write clean SQL that runs fast and reads clearly.

The best SQL code is code that someone else can understand without asking you questions.

Which SQL pattern do you wish you'd learned earlier? I'm writing a follow-up on performance tuning — what should I cover?