Monitoring with CloudWatch and CloudTrail
Now that you understand AWS core services and security fundamentals, it's time to learn how to monitor and track activity across your cloud environment. In this lesson, you'll discover how AWS CloudWatch and CloudTrail work together to provide comprehensive visibility into your infrastructure performance and security events.
Learning Goals:
- Understand CloudWatch metrics, alarms, and logs for performance monitoring
- Configure CloudWatch alarms to notify you of operational issues
- Use CloudTrail to track API activity and security events
- Create dashboards to visualize your application health
- Set up basic monitoring for common AWS services
CloudWatch: Monitoring Your AWS Resources
AWS CloudWatch is a monitoring service that collects and tracks metrics, monitors log files, and sets alarms. Think of it as the central nervous system for observing your AWS resources and applications.
Key CloudWatch Concepts
Metrics are data points representing the performance of your AWS services. For example, EC2 instances provide CPU utilization metrics, while S3 buckets track request counts.
Alarms watch metrics over time and trigger actions when thresholds are breached. You can send notifications, auto-scale resources, or run Lambda functions.
Logs capture application and system logs from EC2 instances, Lambda functions, and other sources.
Creating Your First CloudWatch Alarm
Let's create a simple alarm that monitors CPU utilization on an EC2 instance:
aws cloudwatch put-metric-alarm \
--alarm-name "High-CPU-Utilization" \
--alarm-description "Alarm when CPU exceeds 80 percent" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-sns-topic
Use the AWS Management Console to explore available metrics before creating alarms. Navigate to CloudWatch → Metrics → All metrics to see what data each service provides.
Working with CloudWatch Logs
CloudWatch Logs can capture and monitor logs from various sources. Here's how to configure log groups and stream application logs:
import boto3
import logging
from pythonjsonlogger import jsonlogger
# Configure logger
logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
# Application code with structured logging
def process_data(data):
try:
logger.info("Processing started", extra={'data_size': len(data)})
# Your processing logic here
logger.info("Processing completed successfully")
except Exception as e:
logger.error("Processing failed", extra={'error': str(e)})
CloudTrail: Tracking API Activity
AWS CloudTrail records API calls made in your AWS account, providing a history of who did what, when, and from where. This is crucial for security analysis, compliance, and troubleshooting.
Understanding CloudTrail Events
CloudTrail captures two types of events:
- Management Events: Operations that modify resources (create, delete, update)
- Data Events: Operations that access data (S3 object downloads, Lambda function invocations)
{
"eventVersion": "1.08",
"eventTime": "2024-01-15T12:30:45Z",
"eventName": "RunInstances",
"awsRegion": "us-east-1",
"sourceIPAddress": "203.0.113.12",
"userIdentity": {
"type": "IAMUser",
"userName": "alice@example.com"
},
"requestParameters": {
"instanceType": "t3.micro",
"imageId": "ami-0abcdef1234567890"
},
"responseElements": {
"instanceId": "i-1234567890abcdef0"
}
}
Enabling CloudTrail
Enable CloudTrail to start logging API activity:
aws cloudtrail create-trail \
--name my-management-trail \
--s3-bucket-name my-cloudtrail-logs \
--is-multi-region-trail
aws cloudtrail start-logging --name my-management-trail
CloudTrail is enabled by default for management events, but doesn't log data events. Enable data event logging separately for services like S3 if you need to track data access patterns.
Building Monitoring Dashboards
Create comprehensive dashboards to visualize your application health across multiple services:
{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
["AWS/EC2", "CPUUtilization", "InstanceId", "i-1234567890abcdef0"],
["AWS/EC2", "NetworkIn", "InstanceId", "i-1234567890abcdef0"]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "EC2 Instance Performance"
}
},
{
"type": "log",
"properties": {
"query": "fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20",
"region": "us-east-1",
"title": "Recent Application Errors"
}
}
]
}
Common Pitfalls
-
Ignoring CloudTrail Costs: Data events (S3 object-level logging) can generate massive volumes of logs and incur significant costs. Be selective about what you log.
-
Over-alerting: Setting alarm thresholds too aggressively leads to alert fatigue. Start with conservative thresholds and adjust based on actual application behavior.
-
Missing Cross-Region Logging: CloudTrail trails are region-specific by default. Use multi-region trails for comprehensive security monitoring.
-
Insufficient Log Retention: Default log retention is indefinite for CloudTrail and 30 days for CloudWatch Logs. Set appropriate retention policies based on compliance requirements.
-
Not Monitoring Cost Metrics: Forgetting to set up billing alarms can lead to unexpected charges. Always enable AWS Budgets alongside CloudWatch.
Summary
CloudWatch provides real-time monitoring of your AWS resources through metrics, alarms, and logs, while CloudTrail delivers comprehensive audit trails of API activity. Together, they form the foundation of observability in AWS, enabling you to maintain application health, troubleshoot issues efficiently, and meet security compliance requirements. Remember to configure appropriate alarm thresholds, enable multi-region CloudTrail logging, and establish log retention policies that match your organizational needs.
AWS Monitoring & Logging Fundamentals
What's the primary difference between CloudWatch and CloudTrail?