Skip to main content

Working with Subqueries

Subqueries let you embed one query inside another. They are perfect for questions like "Which products are priced above the catalog average?" or "Which sellers offer items in Electronics or Fashion?" In this lesson we will write subqueries that rely on the marketplace schema so you can see how nested logic complements joins.

Learning Goals

  • Write subqueries in WHERE, SELECT, and FROM clauses
  • Distinguish between single-row and multi-row subqueries
  • Use correlated subqueries to compare each row against its peers
  • Decide when a join or a subquery communicates intent more clearly

Single-Row Subqueries

A single-row subquery returns exactly one value that you can compare with operators such as =, >, or <.

Products priced above the catalog average
SELECT sku,
product_name,
price
FROM product
WHERE price > (
SELECT AVG(price)
FROM product
);

The inner query calculates the catalog-wide average price. The outer query keeps only products above that average.

Multi-Row Subqueries

Multi-row subqueries return a list of values. Combine them with operators like IN, ANY, or ALL.

Sellers who list in key categories
SELECT first_name,
last_name,
store_name
FROM seller
WHERE seller_id IN (
SELECT seller_id
FROM product
WHERE category_id IN (1, 3)
);

Here the subquery identifies seller IDs that have products in the specified categories, and the outer query returns seller details.

Subqueries in the SELECT Clause

You can use a subquery to compute a value for each row in the result set. This pattern is handy when you need a count or aggregate that depends on the current row.

Category product counts
SELECT cat.category_name,
(SELECT COUNT(*)
FROM product AS p
WHERE p.category_id = cat.category_id) AS product_count
FROM category AS cat
ORDER BY product_count DESC;

Each time the outer query processes a category, the subquery counts how many products that category owns.

Subqueries in the FROM Clause

Subqueries in the FROM clause behave like temporary tables. They are often called derived tables.

Average price per category
SELECT category_name,
avg_price
FROM (
SELECT cat.category_name,
AVG(p.price) AS avg_price
FROM category AS cat
JOIN product AS p ON p.category_id = cat.category_id
GROUP BY cat.category_name
) AS category_stats
WHERE avg_price >= 40
ORDER BY avg_price DESC;

The inner query calculates the average price for each category; the outer query filters that derived table.

Correlated Subqueries

Correlated subqueries reference columns from the outer query. They run once for every row produced by the outer query, making them powerful and occasionally expensive.

Products with the highest price in their category
SELECT p.sku,
p.product_name,
cat.category_name,
p.price
FROM product AS p
JOIN category AS cat ON cat.category_id = p.category_id
WHERE p.price = (
SELECT MAX(p2.price)
FROM product AS p2
WHERE p2.category_id = p.category_id
);

For each product, the subquery checks the maximum price inside that product's category. Only the top-priced products per category remain.

EXISTS for Presence Checks

Another common correlated pattern uses EXISTS to test whether related rows exist.

Categories with orders in March 2025
SELECT category_name
FROM category AS cat
WHERE EXISTS (
SELECT 1
FROM order_item AS oi
JOIN orders AS o ON o.order_id = oi.order_id
JOIN product AS p ON p.product_id = oi.product_id
WHERE p.category_id = cat.category_id
AND o.order_date LIKE '2025-03%'
);

Subqueries vs. JOINs

Many problems can be solved using either subqueries or joins. Pick the approach that best expresses the requirement.

Products in Electronics
-- Using a subquery
SELECT sku, product_name
FROM product
WHERE category_id = (
SELECT category_id
FROM category
WHERE category_name = 'Electronics'
);

-- Using a join
SELECT p.sku, p.product_name
FROM product AS p
JOIN category AS cat ON cat.category_id = p.category_id
WHERE cat.category_name = 'Electronics';

The join version is often easier to extend when you need columns from both tables, while the subquery version can feel cleaner for simple lookups.

Common Pitfalls

  • Unexpected multi-row results: Single-row subqueries combined with = or > must return exactly one value. If the inner query returns more, the statement fails.
  • Correlated subquery performance: Because they execute once per outer row, they can be slow on large datasets.
  • NULL comparisons: If a subquery returns NULL, comparisons like = or IN may not behave as expected. Use EXISTS/NOT EXISTS or handle NULL explicitly.
  • Readability: Deeply nested subqueries are hard to maintain. Consider breaking them into views or common table expressions when they become complex.

Summary

  • Subqueries can appear in WHERE, SELECT, and FROM clauses.
  • Use multi-row subqueries with operators like IN, ANY, and ALL.
  • Correlated subqueries evaluate once per row and unlock comparisons within groups.
  • Decide between joins and subqueries based on clarity, required columns, and performance.

Quiz

Show quiz
  1. Which operator would you use with a subquery that returns several category IDs?
    A) =
    B) IN
    C) BETWEEN
    D) LIKE

  2. What makes a subquery "correlated"?
    A) It returns multiple rows
    B) It appears in the FROM clause
    C) It references columns from the outer query
    D) It uses GROUP BY

  3. In the query that counts products per category within the SELECT list, how many times does the subquery run?
    A) Once total
    B) Once per category row returned
    C) Once per product
    D) It depends on the database engine

  4. True or False: You can always rewrite a subquery as a join without changing the result.
    A) True
    B) False


Answers

  1. B
  2. C
  3. B
  4. B