Working with Subqueries
Subqueries let you embed one query inside another. They are perfect for questions like "Which products are priced above the catalog average?" or "Which sellers offer items in Electronics or Fashion?" In this lesson we will write subqueries that rely on the marketplace schema so you can see how nested logic complements joins.
Learning Goals
- Write subqueries in
WHERE,SELECT, andFROMclauses - Distinguish between single-row and multi-row subqueries
- Use correlated subqueries to compare each row against its peers
- Decide when a join or a subquery communicates intent more clearly
Single-Row Subqueries
A single-row subquery returns exactly one value that you can compare with operators such as =, >, or <.
SELECT sku,
product_name,
price
FROM product
WHERE price > (
SELECT AVG(price)
FROM product
);
The inner query calculates the catalog-wide average price. The outer query keeps only products above that average.
Multi-Row Subqueries
Multi-row subqueries return a list of values. Combine them with operators like IN, ANY, or ALL.
SELECT first_name,
last_name,
store_name
FROM seller
WHERE seller_id IN (
SELECT seller_id
FROM product
WHERE category_id IN (1, 3)
);
Here the subquery identifies seller IDs that have products in the specified categories, and the outer query returns seller details.
Subqueries in the SELECT Clause
You can use a subquery to compute a value for each row in the result set. This pattern is handy when you need a count or aggregate that depends on the current row.
SELECT cat.category_name,
(SELECT COUNT(*)
FROM product AS p
WHERE p.category_id = cat.category_id) AS product_count
FROM category AS cat
ORDER BY product_count DESC;
Each time the outer query processes a category, the subquery counts how many products that category owns.
Subqueries in the FROM Clause
Subqueries in the FROM clause behave like temporary tables. They are often called derived tables.
SELECT category_name,
avg_price
FROM (
SELECT cat.category_name,
AVG(p.price) AS avg_price
FROM category AS cat
JOIN product AS p ON p.category_id = cat.category_id
GROUP BY cat.category_name
) AS category_stats
WHERE avg_price >= 40
ORDER BY avg_price DESC;
The inner query calculates the average price for each category; the outer query filters that derived table.
Correlated Subqueries
Correlated subqueries reference columns from the outer query. They run once for every row produced by the outer query, making them powerful and occasionally expensive.
SELECT p.sku,
p.product_name,
cat.category_name,
p.price
FROM product AS p
JOIN category AS cat ON cat.category_id = p.category_id
WHERE p.price = (
SELECT MAX(p2.price)
FROM product AS p2
WHERE p2.category_id = p.category_id
);
For each product, the subquery checks the maximum price inside that product's category. Only the top-priced products per category remain.
EXISTS for Presence Checks
Another common correlated pattern uses EXISTS to test whether related rows exist.
SELECT category_name
FROM category AS cat
WHERE EXISTS (
SELECT 1
FROM order_item AS oi
JOIN orders AS o ON o.order_id = oi.order_id
JOIN product AS p ON p.product_id = oi.product_id
WHERE p.category_id = cat.category_id
AND o.order_date LIKE '2025-03%'
);
Subqueries vs. JOINs
Many problems can be solved using either subqueries or joins. Pick the approach that best expresses the requirement.
-- Using a subquery
SELECT sku, product_name
FROM product
WHERE category_id = (
SELECT category_id
FROM category
WHERE category_name = 'Electronics'
);
-- Using a join
SELECT p.sku, p.product_name
FROM product AS p
JOIN category AS cat ON cat.category_id = p.category_id
WHERE cat.category_name = 'Electronics';
The join version is often easier to extend when you need columns from both tables, while the subquery version can feel cleaner for simple lookups.
Common Pitfalls
- Unexpected multi-row results: Single-row subqueries combined with
=or>must return exactly one value. If the inner query returns more, the statement fails. - Correlated subquery performance: Because they execute once per outer row, they can be slow on large datasets.
- NULL comparisons: If a subquery returns
NULL, comparisons like=orINmay not behave as expected. UseEXISTS/NOT EXISTSor handleNULLexplicitly. - Readability: Deeply nested subqueries are hard to maintain. Consider breaking them into views or common table expressions when they become complex.
Summary
- Subqueries can appear in
WHERE,SELECT, andFROMclauses. - Use multi-row subqueries with operators like
IN,ANY, andALL. - Correlated subqueries evaluate once per row and unlock comparisons within groups.
- Decide between joins and subqueries based on clarity, required columns, and performance.
Quiz
Show quiz
-
Which operator would you use with a subquery that returns several category IDs?
A)=
B)IN
C)BETWEEN
D)LIKE -
What makes a subquery "correlated"?
A) It returns multiple rows
B) It appears in theFROMclause
C) It references columns from the outer query
D) It usesGROUP BY -
In the query that counts products per category within the
SELECTlist, how many times does the subquery run?
A) Once total
B) Once per category row returned
C) Once per product
D) It depends on the database engine -
True or False: You can always rewrite a subquery as a join without changing the result.
A) True
B) False
Answers
- B
- C
- B
- B