Grouping Data with GROUP BY

Now that you've mastered aggregate functions like COUNT(), SUM(), and AVG(), you're ready to take your data analysis skills to the next level. In this lesson, you'll learn how to use the GROUP BY clause to organize your data into meaningful groups and calculate aggregate values for each group separately.

Learning Goals:

Understand when and why to use GROUP BY
Write queries that group data by one or more columns
Combine GROUP BY with aggregate functions
Recognize the difference between WHERE and HAVING

Understanding GROUP BY

The GROUP BY clause divides rows returned by a query into groups based on one or more columns. For each group, you can apply aggregate functions to calculate summary values.

Think of it like organizing orders: you could group them by status to count how many are delivered vs pending, or group products by category to calculate the average price in each category.

Basic GROUP BY Syntax

Basic GROUP BY structure
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

The GROUP BY clause always comes after the FROM and WHERE clauses, but before ORDER BY.

Single Column Grouping

Let's start with a simple example from the marketplace schema: counting how many customers prefer each category.

Grouping customers by preferred category
SELECT COALESCE(cat.category_name, 'No preference') AS category_name,
       COUNT(*) AS customer_count
FROM customer AS c
LEFT JOIN category AS cat ON cat.category_id = c.preferred_category_id
GROUP BY COALESCE(cat.category_name, 'No preference')
ORDER BY customer_count DESC;

This query would return something like:

category_name	customer_count
Electronics	3
Fashion	2
No preference	2

tip

Always include the grouped column(s) in your SELECT list. This makes your results meaningful by showing which group each aggregate value belongs to.

GROUP BY with Multiple Aggregates

You can use multiple aggregate functions in the same GROUP BY query:

Multiple aggregates per group
SELECT o.status,
       COUNT(*) AS total_orders,
       COUNT(DISTINCT o.customer_id) AS unique_customers,
       SUM(o.order_total) AS total_revenue,
       AVG(o.order_total) AS avg_order_total
FROM orders AS o
GROUP BY o.status
ORDER BY total_orders DESC;

Grouping by Multiple Columns

Sometimes you need to create more specific groups by combining multiple columns:

Grouping by category and order status
SELECT cat.category_name,
       o.status,
       COUNT(*) AS items_sold
FROM order_item AS oi
JOIN orders AS o ON o.order_id = oi.order_id
JOIN product AS p ON p.product_id = oi.product_id
JOIN category AS cat ON cat.category_id = p.category_id
GROUP BY cat.category_name, o.status
ORDER BY cat.category_name, o.status;

This creates groups for each unique combination of category and order status.

GROUP BY with WHERE Clause

You can filter rows before they're grouped using WHERE:

Filtering before grouping
SELECT cat.category_name,
       AVG(p.price) AS avg_price
FROM product AS p
JOIN category AS cat ON cat.category_id = p.category_id
WHERE p.price > 30
GROUP BY cat.category_name;

warning

The WHERE clause filters individual rows before grouping occurs. If you need to filter groups based on aggregate values, you'll need the HAVING clause (covered in the next lesson).

Real-World Example: Order Analysis

Let's work through a complete example with the marketplace data:

Category order summary
-- First, inspect the data we'll summarize
SELECT oi.order_item_id,
       o.order_id,
       p.sku,
       cat.category_name,
       o.order_date,
       o.status
FROM order_item AS oi
JOIN orders AS o ON o.order_id = oi.order_id
JOIN product AS p ON p.product_id = oi.product_id
JOIN category AS cat ON cat.category_id = p.category_id
LIMIT 5;

-- Now group order items by category and status for March 2025
SELECT 
    cat.category_name,
    o.status,
    COUNT(*) AS total_items,
    COUNT(DISTINCT o.customer_id) AS unique_customers,
    SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_item AS oi
JOIN orders AS o ON o.order_id = oi.order_id
JOIN product AS p ON p.product_id = oi.product_id
JOIN category AS cat ON cat.category_id = p.category_id
WHERE o.order_date LIKE '2025-03%'
GROUP BY cat.category_name, o.status
ORDER BY total_items DESC;

Common Pitfalls

Forgetting non-aggregated columns: Every column in the SELECT clause must either be in the GROUP BY clause or wrapped in an aggregate function
Confusing WHERE and HAVING: Use WHERE to filter rows before grouping, HAVING to filter groups after aggregation
Over-grouping: Too many grouping columns can result in too many small groups, making analysis difficult
NULL values: GROUP BY treats all NULL values as a single group

Summary

The GROUP BY clause is essential for data analysis in SQL. It allows you to:

Organize data into logical groups
Calculate aggregate values for each group
Combine multiple grouping columns for detailed analysis
Work with WHERE to filter data before grouping

Remember that GROUP BY works hand-in-hand with aggregate functions to transform raw order, product, and customer data into meaningful summaries.

Show quiz

What happens if you include a column in SELECT that's not in GROUP BY and not wrapped in an aggregate function?
Can you use GROUP BY with columns from a WHERE clause filter?
How does GROUP BY handle NULL values in grouping columns?
What's the correct order of these clauses: GROUP BY, WHERE, ORDER BY?
True or False: You can only group by one column at a time.

Answers:

You'll get an error. All non-aggregated columns in SELECT must appear in GROUP BY.
Yes, WHERE filters rows before grouping occurs, so you can group by any column in the table.
All NULL values are grouped together into a single group.
WHERE -> GROUP BY -> ORDER BY
False, you can group by multiple columns separated by commas.

Understanding GROUP BY​

Basic GROUP BY Syntax​

Single Column Grouping​

GROUP BY with Multiple Aggregates​

Grouping by Multiple Columns​

GROUP BY with WHERE Clause​

Real-World Example: Order Analysis​

Common Pitfalls​

Summary​