Skip to main content

Grouping Data with GROUP BY

Now that you've mastered aggregate functions like COUNT(), SUM(), and AVG(), you're ready to take your data analysis skills to the next level. In this lesson, you'll learn how to use the GROUP BY clause to organize your data into meaningful groups and calculate aggregate values for each group separately.

Learning Goals:

  • Understand when and why to use GROUP BY
  • Write queries that group data by one or more columns
  • Combine GROUP BY with aggregate functions
  • Recognize the difference between WHERE and HAVING

Understanding GROUP BY

The GROUP BY clause divides rows returned by a query into groups based on one or more columns. For each group, you can apply aggregate functions to calculate summary values.

Think of it like organizing enrollment records: you could group them by semester to count how many students took classes each term, or group courses by department to calculate the average credit load in each program.

Basic GROUP BY Syntax

Basic GROUP BY structure
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

The GROUP BY clause always comes after the FROM and WHERE clauses, but before ORDER BY.

Single Column Grouping

Let's start with a simple example from the University schema: counting how many students belong to each department.

Grouping students by department
SELECT d.dept_name,
COUNT(*) AS student_count
FROM student AS s
JOIN department AS d ON d.dept_id = s.dept_id
GROUP BY d.dept_name
ORDER BY student_count DESC;

This query would return something like:

dept_namestudent_count
Computer Science4
Business Administration2
Information Technology2
tip

Always include the grouped column(s) in your SELECT list. This makes your results meaningful by showing which group each aggregate value belongs to.

GROUP BY with Multiple Aggregates

You can use multiple aggregate functions in the same GROUP BY query:

Multiple aggregates per group
SELECT e.semester,
COUNT(*) AS total_enrollments,
COUNT(DISTINCT e.student_id) AS unique_students,
SUM(c.credits) AS total_credits,
AVG(c.credits) AS avg_course_credits
FROM enrollment AS e
JOIN course AS c ON c.course_id = e.course_id
GROUP BY e.semester
ORDER BY e.semester;

Grouping by Multiple Columns

Sometimes you need to create more specific groups by combining multiple columns:

Grouping by department and semester
SELECT d.dept_name,
e.semester,
COUNT(*) AS enrollment_count
FROM enrollment AS e
JOIN course AS c ON c.course_id = e.course_id
JOIN department AS d ON d.dept_id = c.dept_id
GROUP BY d.dept_name, e.semester
ORDER BY d.dept_name, e.semester;

This creates groups for each unique combination of department and semester.

GROUP BY with WHERE Clause

You can filter rows before they're grouped using WHERE:

Filtering before grouping
SELECT d.dept_name,
AVG(c.credits) AS avg_credits
FROM course AS c
JOIN department AS d ON d.dept_id = c.dept_id
WHERE c.credits > 3
GROUP BY d.dept_name;
warning

The WHERE clause filters individual rows before grouping occurs. If you need to filter groups based on aggregate values, you'll need the HAVING clause (covered in the next lesson).

Real-World Example: Enrollment Analysis

Let's work through a complete example with the University data:

Department enrollment summary
-- First, inspect the data we'll summarize
SELECT e.enroll_id,
s.student_id,
c.course_code,
d.dept_name,
e.semester,
e.grade
FROM enrollment AS e
JOIN student AS s ON s.student_id = e.student_id
JOIN course AS c ON c.course_id = e.course_id
JOIN department AS d ON d.dept_id = c.dept_id
LIMIT 5;

-- Now group enrollments by department and semester
SELECT
d.dept_name,
e.semester,
COUNT(*) AS total_enrollments,
COUNT(DISTINCT e.student_id) AS unique_students,
SUM(CASE WHEN e.grade IS NOT NULL THEN 1 ELSE 0 END) AS completed_courses
FROM enrollment AS e
JOIN course AS c ON c.course_id = e.course_id
JOIN department AS d ON d.dept_id = c.dept_id
WHERE e.semester LIKE 'Spring 2025'
GROUP BY d.dept_name, e.semester
ORDER BY total_enrollments DESC;

Common Pitfalls

  • Forgetting non-aggregated columns: Every column in the SELECT clause must either be in the GROUP BY clause or wrapped in an aggregate function
  • Confusing WHERE and HAVING: Use WHERE to filter rows before grouping, HAVING to filter groups after aggregation
  • Over-grouping: Too many grouping columns can result in too many small groups, making analysis difficult
  • NULL values: GROUP BY treats all NULL values as a single group

Summary

The GROUP BY clause is essential for data analysis in SQL. It allows you to:

  • Organize data into logical groups
  • Calculate aggregate values for each group
  • Combine multiple grouping columns for detailed analysis
  • Work with WHERE to filter data before grouping

Remember that GROUP BY works hand-in-hand with aggregate functions to transform raw student, course, and enrollment data into meaningful summaries.

Show quiz
  1. What happens if you include a column in SELECT that's not in GROUP BY and not wrapped in an aggregate function?
  2. Can you use GROUP BY with columns from a WHERE clause filter?
  3. How does GROUP BY handle NULL values in grouping columns?
  4. What's the correct order of these clauses: GROUP BY, WHERE, ORDER BY?
  5. True or False: You can only group by one column at a time.

Answers:

  1. You'll get an error. All non-aggregated columns in SELECT must appear in GROUP BY.
  2. Yes, WHERE filters rows before grouping occurs, so you can group by any column in the table.
  3. All NULL values are grouped together into a single group.
  4. WHERE -> GROUP BY -> ORDER BY
  5. False, you can group by multiple columns separated by commas.