Nginx High Availability and Failover Setup
In this final lesson, we'll explore how to ensure your Nginx infrastructure remains available even when individual components fail. Building on your knowledge of load balancing and monitoring, you'll learn to implement robust high availability solutions that keep your services running smoothly.
Learning Goals:
- Understand high availability concepts and failover strategies
- Implement active-passive Nginx configurations
- Configure health checks and automatic failover
- Set up shared storage and session persistence
- Monitor and test your high availability setup
Understanding High Availability Concepts
High availability (HA) ensures your services remain accessible even during hardware failures, network issues, or maintenance. The key principle is eliminating single points of failure.
Active-Passive vs Active-Active
Active-Passive: One server handles traffic while others stand by as backups Active-Active: Multiple servers share the load simultaneously
For most web applications, active-passive configurations provide the best balance of simplicity and reliability. Active-active requires more complex session management but offers better resource utilization.
Implementing Active-Passive Nginx with Keepalived
Keepalived provides IP failover capabilities, allowing a backup server to automatically take over a virtual IP address when the primary fails.
Installing Keepalived
- Ubuntu/Debian
- CentOS/RHEL
sudo apt update
sudo apt install keepalived
sudo yum install keepalived
Configuring Keepalived on Primary Server
vrrp_script chk_nginx {
script "/usr/bin/killall -0 nginx"
interval 2
weight 50
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass secret123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
chk_nginx
}
}
Configuring Keepalived on Backup Server
vrrp_script chk_nginx {
script "/usr/bin/killall -0 nginx"
interval 2
weight 50
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass secret123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
chk_nginx
}
}
Health Check Configuration
Robust health checks are crucial for automatic failover. Let's implement comprehensive health monitoring.
Nginx Status Page
server {
listen 127.0.0.1:8080;
server_name localhost;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
location /health {
access_log off;
allow 127.0.0.1;
deny all;
# Custom health check logic
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
Advanced Health Check Script
#!/bin/bash
# Check if Nginx process is running
if ! killall -0 nginx 2>/dev/null; then
exit 1
fi
# Check if Nginx responds on local port
if ! curl -f http://127.0.0.1:80/ >/dev/null 2>&1; then
exit 1
fi
# Check if status page is accessible
if ! curl -f http://127.0.0.1:8080/nginx_status >/dev/null 2>&1; then
exit 1
fi
exit 0
Shared Storage for Configuration and Content
Ensure all nodes have consistent configuration and content using shared storage or synchronization.
Using rsync for Configuration Sync
#!/bin/bash
PRIMARY_SERVER="primary.example.com"
BACKUP_SERVERS=("backup1.example.com" "backup2.example.com")
# Sync configuration files
rsync -avz --delete /etc/nginx/ $PRIMARY_SERVER:/etc/nginx/
for server in "${BACKUP_SERVERS[@]}"; do
rsync -avz --delete /etc/nginx/ $server:/etc/nginx/
ssh $server "nginx -t && systemctl reload nginx"
done
Automated Configuration Sync with inotify
#!/bin/bash
inotifywait -m -e modify,create,delete /etc/nginx/ |
while read path action file; do
echo "Detected change: $file"
/usr/local/bin/sync_nginx_config.sh
done
Session Persistence in Load Balanced Environments
When using multiple active nodes, maintain user sessions during failover.
Sticky Sessions with IP Hash
upstream backend {
ip_hash;
server 192.168.1.10:80;
server 192.168.1.11:80;
server 192.168.1.12:80;
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
External Session Storage with Redis
upstream backend {
hash $cookie_jsessionid;
server 192.168.1.10:80;
server 192.168.1.11:80;
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
# Session affinity
set $session_sticky 1;
}
}
Monitoring and Alerting
Implement comprehensive monitoring to detect issues before they cause outages.
Custom Nginx Monitoring Script
#!/bin/bash
VIRTUAL_IP="192.168.1.100"
PRIMARY_SERVER="192.168.1.10"
BACKUP_SERVER="192.168.1.11"
check_server() {
local server=$1
if curl -s --connect-timeout 5 "http://$server/health" | grep -q "healthy"; then
return 0
else
return 1
fi
}
# Check virtual IP accessibility
if ! ping -c 1 -W 1 $VIRTUAL_IP >/dev/null 2>&1; then
echo "ALERT: Virtual IP $VIRTUAL_IP is not accessible"
# Send alert via email, Slack, etc.
fi
# Check individual servers
if ! check_server $PRIMARY_SERVER; then
echo "ALERT: Primary server $PRIMARY_SERVER is down"
fi
if ! check_server $BACKUP_SERVER; then
echo "ALERT: Backup server $BACKUP_SERVER is down"
fi
Testing Your Failover Setup
Regular testing ensures your failover mechanism works when needed.
Manual Failover Test
# Stop Nginx on primary to trigger failover
sudo systemctl stop nginx
# Monitor virtual IP movement
ping 192.168.1.100
# Check which server now has the virtual IP
ip addr show eth0 | grep 192.168.1.100
# Restore primary and verify failback
sudo systemctl start nginx
Automated Failover Testing
#!/bin/bash
echo "Starting failover test..."
# Simulate primary failure
ssh primary-server "sudo systemctl stop nginx"
# Wait for failover
sleep 10
# Verify backup is serving traffic
if curl -f http://192.168.1.100/ >/dev/null 2>&1; then
echo "SUCCESS: Failover completed successfully"
else
echo "FAILURE: Failover failed"
exit 1
fi
# Restore primary
ssh primary-server "sudo systemctl start nginx"
echo "Test completed"
Common Pitfalls
- Split-brain scenario: When both servers think they're primary, caused by network partitions
- Insufficient health checks: Only checking if Nginx process exists, not if it's actually serving requests
- Session data loss: Not implementing shared session storage for stateful applications
- DNS caching: Clients caching DNS records and not failing over to the new IP
- Asymmetric configurations: Differences in configuration between primary and backup servers
- Inadequate monitoring: Not detecting when failover occurs or when servers are unhealthy
- No failback testing: Never testing the process of returning to the primary server after repair
Always test your failover during maintenance windows. Unexpected behavior during actual failures can lead to extended downtime. Document the failover process and train your team on manual intervention procedures.
Summary
You've learned to build a robust high availability Nginx setup using Keepalived for IP failover, comprehensive health checks, and shared configuration management. Remember that high availability is not just about technology—it requires regular testing, monitoring, and well-documented procedures. Your HA setup should be as simple as possible while meeting your availability requirements.
Nginx High Availability and Load Balancing Concepts
What is the main difference between active-passive and active-active high availability configurations?