Advanced Aggregation Framework
Now that you've mastered the fundamentals of the aggregation pipeline, let's dive into more powerful operations that will transform how you analyze and process data in MongoDB. This lesson explores advanced aggregation stages and techniques that enable complex data transformations, statistical analysis, and sophisticated reporting.
Learning Goals:
- Master advanced aggregation stages like
$facet,$bucket, and$lookup - Implement complex data transformations and statistical operations
- Optimize aggregation pipelines for performance
- Handle advanced grouping and window functions
Multi-faceted Aggregations with $facet
The $facet stage allows you to execute multiple aggregation pipelines within a single stage, producing multiple result sets in one query. This is particularly useful for building comprehensive dashboards or reports.
// Analyze e-commerce data with multiple perspectives
db.orders.aggregate([
{
$facet: {
// Sales by category
"salesByCategory": [
{ $unwind: "$items" },
{
$group: {
_id: "$items.category",
totalSales: { $sum: "$items.price" },
averagePrice: { $avg: "$items.price" },
orderCount: { $sum: 1 }
}
},
{ $sort: { totalSales: -1 } }
],
// Monthly revenue trend
"monthlyRevenue": [
{
$group: {
_id: {
year: { $year: "$orderDate" },
month: { $month: "$orderDate" }
},
revenue: { $sum: "$total" },
orders: { $sum: 1 }
}
},
{ $sort: { "_id.year": 1, "_id.month": 1 } }
],
// Customer segmentation
"customerStats": [
{
$group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
orderCount: { $sum: 1 },
firstOrder: { $min: "$orderDate" },
lastOrder: { $max: "$orderDate" }
}
},
{
$bucket: {
groupBy: "$totalSpent",
boundaries: [0, 100, 500, 1000, 5000],
default: "VIP",
output: {
customers: { $sum: 1 },
averageOrders: { $avg: "$orderCount" }
}
}
}
]
}
}
])
Use $facet when you need multiple different aggregations from the same dataset. It's more efficient than running separate queries since it processes the data only once.
Advanced Grouping with $bucket and $bucketAuto
Bucket operations help you categorize data into ranges, perfect for histograms, price ranges, or age groups.
// Segment customers by spending using $bucket
db.orders.aggregate([
{
$group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
orderCount: { $sum: 1 }
}
},
{
$bucket: {
groupBy: "$totalSpent",
boundaries: [0, 50, 200, 500, 1000],
default: "High Value",
output: {
count: { $sum: 1 },
avgOrderCount: { $avg: "$orderCount" },
topSpenders: {
$push: {
customerId: "$_id",
totalSpent: "$totalSpent"
}
}
}
}
}
])
// Let MongoDB determine optimal bucket ranges
db.products.aggregate([
{
$bucketAuto: {
groupBy: "$price",
buckets: 5,
output: {
count: { $sum: 1 },
priceRange: {
$push: {
product: "$name",
price: "$price"
}
},
avgPrice: { $avg: "$price" }
}
}
}
])
Advanced Lookup Operations
Go beyond basic $lookup with advanced joining techniques and pipeline-based lookups.
// Advanced lookup with filtering and transformation
db.orders.aggregate([
{
$lookup: {
from: "customers",
let: { customerId: "$customerId" },
pipeline: [
{ $match: { $expr: { $eq: ["$_id", "$$customerId"] } } },
{
$project: {
name: 1,
email: 1,
membershipLevel: 1,
joinDate: 1
}
}
],
as: "customerInfo"
}
},
{
$unwind: {
path: "$customerInfo",
preserveNullAndEmptyArrays: true // Left join behavior
}
}
])
Window Functions and Advanced Analytics
MongoDB 5.0+ introduced window functions for advanced analytical operations.
// Calculate running totals and moving averages
db.sales.aggregate([
{ $sort: { date: 1 } },
{
$setWindowFields: {
partitionBy: "$region",
sortBy: { date: 1 },
output: {
runningTotal: {
$sum: "$amount",
window: {
documents: ["unbounded", "current"]
}
},
movingAvg: {
$avg: "$amount",
window: {
documents: [-2, 0] // Current and previous 2 documents
}
},
salesRank: {
$rank: {}
}
}
}
}
])
Array Manipulation and Advanced Operators
Master complex array operations for nested data structures.
// Advanced array filtering and transformation
db.orders.aggregate([
{ $unwind: "$items" },
{
$group: {
_id: "$_id",
orderDate: { $first: "$orderDate" },
customerId: { $first: "$customerId" },
highValueItems: {
$push: {
$cond: {
if: { $gte: ["$items.price", 100] },
then: "$items",
else: "$$REMOVE"
}
}
},
totalHighValue: {
$sum: {
$cond: {
if: { $gte: ["$items.price", 100] },
then: "$items.price",
else: 0
}
}
}
}
},
{
$match: {
totalHighValue: { $gt: 500 }
}
}
])
Complex array operations can significantly impact performance. Always test with appropriate indexes and consider your data volume before deploying to production.
Conditional Logic and Dynamic Fields
Use conditional operators to create dynamic aggregations.
// Dynamic categorization and conditional calculations
db.orders.aggregate([
{
$addFields: {
orderSize: {
$switch: {
branches: [
{ case: { $lt: ["$total", 50] }, then: "Small" },
{ case: { $lt: ["$total", 200] }, then: "Medium" },
{ case: { $lt: ["$total", 500] }, then: "Large" }
],
default: "Extra Large"
}
},
isWeekend: {
$in: [
{ $dayOfWeek: "$orderDate" },
[1, 7] // Sunday = 1, Saturday = 7
]
}
}
},
{
$group: {
_id: {
size: "$orderSize",
weekend: "$isWeekend"
},
totalRevenue: { $sum: "$total" },
averageOrderValue: { $avg: "$total" },
orderCount: { $sum: 1 }
}
}
])
Performance Optimization Techniques
- JavaScript
- TypeScript
// Optimized aggregation with proper stage ordering
db.orders.aggregate([
// Early filtering reduces documents processed
{ $match: {
orderDate: {
$gte: ISODate("2024-01-01"),
$lt: ISODate("2024-02-01")
},
status: "completed"
}},
// Project only needed fields early
{ $project: {
customerId: 1,
total: 1,
orderDate: 1,
items: 1
}},
// Unwind after filtering
{ $unwind: "$items" },
// Final grouping and sorting
{
$group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
itemCount: { $sum: "$items.quantity" }
}
},
{ $sort: { totalSpent: -1 } },
// Limit results for pagination
{ $limit: 100 }
])
// TypeScript interface for aggregation results
interface CustomerSpending {
_id: string;
totalSpent: number;
itemCount: number;
}
// Optimized pipeline with type safety
const pipeline = [
{
$match: {
orderDate: {
$gte: new Date("2024-01-01"),
$lt: new Date("2024-02-01")
},
status: "completed"
}
},
{
$project: {
customerId: 1,
total: 1,
orderDate: 1,
items: 1
}
},
{ $unwind: "$items" },
{
$group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
itemCount: { $sum: "$items.quantity" }
}
},
{ $sort: { totalSpent: -1 } },
{ $limit: 100 }
];
const results = await db.collection('orders')
.aggregate<CustomerSpending>(pipeline)
.toArray();
Common Pitfalls
- Memory Limits: Complex aggregations can exceed the 100MB memory limit. Use
$limitand$matchearly to reduce dataset size - Unoptimized Unwinding: Avoid unwinding large arrays early in the pipeline; filter documents first
- Missing Indexes: Ensure
$matchand$sortstages use appropriate indexes for optimal performance - Pipeline Order: Incorrect stage ordering can dramatically impact performance; filter early, project needed fields
- Overusing
$facet: While powerful,$facetprocesses the entire dataset for each sub-pipeline; use judiciously
Summary
You've now mastered advanced aggregation techniques that enable sophisticated data analysis in MongoDB. The $facet stage allows multi-dimensional analysis, bucket operations enable data segmentation, window functions provide advanced analytics, and optimized pipeline design ensures performance. These tools transform MongoDB from a simple document store into a powerful analytical engine capable of handling complex business intelligence requirements.
Show quiz
-
What is the primary benefit of using the
$facetstage in an aggregation pipeline?- A) It makes queries faster by using parallel processing
- B) It allows multiple aggregation pipelines to run on the same dataset
- C) It automatically creates indexes for better performance
- D) It reduces memory usage by compressing results
-
When using
$bucketAuto, what determines the range boundaries for each bucket?- A) The developer specifies exact boundaries
- B) MongoDB automatically calculates evenly distributed ranges
- C) Boundaries are based on statistical outliers
- D) Each bucket contains exactly the same number of documents
-
Which optimization technique is most effective for improving aggregation performance?
- A) Using
$facetfor all complex queries - B) Placing
$matchand$limitstages early in the pipeline - C) Avoiding the use of
$unwindentirely - D) Using JavaScript functions within
$function
- A) Using
-
What is a key consideration when using
$unwindon arrays?- A) It should always be the first stage in any pipeline
- B) It creates a new document for each array element, potentially multiplying document count
- C) It only works on arrays with less than 1000 elements
- D) It automatically filters out empty arrays
Answers:
- B -
$facetexecutes multiple aggregation pipelines within a single stage - B -
$bucketAutoautomatically calculates evenly distributed ranges based on the data - B - Early filtering with
$matchand$limitreduces the number of documents processed - B -
$unwindcreates a document for each array element, which can significantly increase document count