Skip to main content

Working with Subqueries

Subqueries let you embed one query inside another. They are perfect for questions like “Which courses are harder than the catalog average?” or “Which students share a department with Professor Mehta?” In this lesson we will write subqueries that rely on the University schema so you can see how nested logic complements joins.

Learning Goals

  • Write subqueries in WHERE, SELECT, and FROM clauses
  • Distinguish between single-row and multi-row subqueries
  • Use correlated subqueries to compare each row against its peers
  • Decide when a join or a subquery communicates intent more clearly

Single-Row Subqueries

A single-row subquery returns exactly one value that you can compare with operators such as =, >, or <.

Courses harder than the catalog average
SELECT course_code,
course_title,
credits
FROM course
WHERE credits > (
SELECT AVG(credits)
FROM course
);

The inner query calculates the catalog-wide average credit value. The outer query keeps only courses above that average.

Multi-Row Subqueries

Multi-row subqueries return a list of values. Combine them with operators like IN, ANY, or ALL.

Instructors in select departments
SELECT first_name,
last_name,
dept_id
FROM instructor
WHERE dept_id IN (
SELECT dept_id
FROM department
WHERE dept_name IN ('Computer Science', 'Mathematics')
);

Here the subquery identifies the department IDs for two programs, and the outer query returns instructors who belong to either.

Subqueries in the SELECT Clause

You can use a subquery to compute a value for each row in the result set. This pattern is handy when you need a count or aggregate that depends on the current row.

Department course counts
SELECT d.dept_name,
(SELECT COUNT(*)
FROM course AS c
WHERE c.dept_id = d.dept_id) AS course_count
FROM department AS d
ORDER BY course_count DESC;

Each time the outer query processes a department, the subquery counts how many courses that department owns.

Subqueries in the FROM Clause

Subqueries in the FROM clause behave like temporary tables. They are often called derived tables.

Average credits per department
SELECT dept_name,
avg_credits
FROM (
SELECT d.dept_name,
AVG(c.credits) AS avg_credits
FROM department AS d
JOIN course AS c ON c.dept_id = d.dept_id
GROUP BY d.dept_name
) AS dept_stats
WHERE avg_credits >= 3.5
ORDER BY avg_credits DESC;

The inner query calculates the average credit load for each department; the outer query filters that derived table.

Correlated Subqueries

Correlated subqueries reference columns from the outer query. They run once for every row produced by the outer query, making them powerful and occasionally expensive.

Courses with the highest credit value in their department
SELECT c.course_code,
c.course_title,
d.dept_name,
c.credits
FROM course AS c
JOIN department AS d ON d.dept_id = c.dept_id
WHERE c.credits = (
SELECT MAX(c2.credits)
FROM course AS c2
WHERE c2.dept_id = c.dept_id
);

For each course, the subquery checks the maximum credit value inside that course's department. Only the top-credit courses per department remain.

EXISTS for Presence Checks

Another common correlated pattern uses EXISTS to test whether related rows exist.

Departments with enrollments in Spring 2025
SELECT dept_name
FROM department AS d
WHERE EXISTS (
SELECT 1
FROM enrollment AS e
JOIN course AS c ON c.course_id = e.course_id
WHERE c.dept_id = d.dept_id
AND e.semester = 'Spring 2025'
);

Subqueries vs. JOINs

Many problems can be solved using either subqueries or joins. Pick the approach that best expresses the requirement.

Courses offered by Computer Science
-- Using a subquery
SELECT course_code, course_title
FROM course
WHERE dept_id = (
SELECT dept_id
FROM department
WHERE dept_name = 'Computer Science'
);

-- Using a join
SELECT c.course_code, c.course_title
FROM course AS c
JOIN department AS d ON d.dept_id = c.dept_id
WHERE d.dept_name = 'Computer Science';

The join version is often easier to extend when you need columns from both tables, while the subquery version can feel cleaner for simple lookups.

Common Pitfalls

  • Unexpected multi-row results: Single-row subqueries combined with = or > must return exactly one value. If the inner query returns more, the statement fails.
  • Correlated subquery performance: Because they execute once per outer row, they can be slow on large datasets.
  • NULL comparisons: If a subquery returns NULL, comparisons like = or IN may not behave as expected. Use EXISTS/NOT EXISTS or handle NULL explicitly.
  • Readability: Deeply nested subqueries are hard to maintain. Consider breaking them into views or common table expressions when they become complex.

Summary

  • Subqueries can appear in WHERE, SELECT, and FROM clauses.
  • Use multi-row subqueries with operators like IN, ANY, and ALL.
  • Correlated subqueries evaluate once per row and unlock comparisons within groups.
  • Decide between joins and subqueries based on clarity, required columns, and performance.

Quiz

Show quiz
  1. Which operator would you use with a subquery that returns several department IDs?
    A) =
    B) IN
    C) BETWEEN
    D) LIKE

  2. What makes a subquery “correlated”?
    A) It returns multiple rows
    B) It appears in the FROM clause
    C) It references columns from the outer query
    D) It uses GROUP BY

  3. In the query that counts courses per department within the SELECT list, how many times does the subquery run?
    A) Once total
    B) Once per department row returned
    C) Once per course
    D) It depends on the database engine

  4. True or False: You can always rewrite a subquery as a join without changing the result.
    A) True
    B) False


Answers

  1. B
  2. C
  3. B
  4. B