Nginx High Availability and Failover Setup

In this final lesson, we'll explore how to ensure your Nginx infrastructure remains available even when individual components fail. Building on your knowledge of load balancing and monitoring, you'll learn to implement robust high availability solutions that keep your services running smoothly.

Learning Goals:

Understand high availability concepts and failover strategies
Implement active-passive Nginx configurations
Configure health checks and automatic failover
Set up shared storage and session persistence
Monitor and test your high availability setup

Understanding High Availability Concepts

High availability (HA) ensures your services remain accessible even during hardware failures, network issues, or maintenance. The key principle is eliminating single points of failure.

Active-Passive vs Active-Active

Active-Passive: One server handles traffic while others stand by as backups Active-Active: Multiple servers share the load simultaneously

tip

For most web applications, active-passive configurations provide the best balance of simplicity and reliability. Active-active requires more complex session management but offers better resource utilization.

Implementing Active-Passive Nginx with Keepalived

Keepalived provides IP failover capabilities, allowing a backup server to automatically take over a virtual IP address when the primary fails.

Installing Keepalived

Ubuntu/Debian
CentOS/RHEL

Install Keepalived
sudo apt update
sudo apt install keepalived

Install Keepalived
sudo yum install keepalived

Configuring Keepalived on Primary Server

/etc/keepalived/keepalived.conf (Primary)
vrrp_script chk_nginx {
    script "/usr/bin/killall -0 nginx"
    interval 2
    weight 50
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass secret123
    }
    
    virtual_ipaddress {
        192.168.1.100/24
    }
    
    track_script {
        chk_nginx
    }
}

Configuring Keepalived on Backup Server

/etc/keepalived/keepalived.conf (Backup)
vrrp_script chk_nginx {
    script "/usr/bin/killall -0 nginx"
    interval 2
    weight 50
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 50
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass secret123
    }
    
    virtual_ipaddress {
        192.168.1.100/24
    }
    
    track_script {
        chk_nginx
    }
}

Health Check Configuration

Robust health checks are crucial for automatic failover. Let's implement comprehensive health monitoring.

Nginx Status Page

/etc/nginx/conf.d/status.conf
server {
    listen 127.0.0.1:8080;
    server_name localhost;
    
    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
    
    location /health {
        access_log off;
        allow 127.0.0.1;
        deny all;
        
        # Custom health check logic
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }
}

Advanced Health Check Script

/usr/local/bin/check_nginx_advanced.sh
#!/bin/bash

# Check if Nginx process is running
if ! killall -0 nginx 2>/dev/null; then
    exit 1
fi

# Check if Nginx responds on local port
if ! curl -f http://127.0.0.1:80/ >/dev/null 2>&1; then
    exit 1
fi

# Check if status page is accessible
if ! curl -f http://127.0.0.1:8080/nginx_status >/dev/null 2>&1; then
    exit 1
fi

exit 0

Shared Storage for Configuration and Content

Ensure all nodes have consistent configuration and content using shared storage or synchronization.

Using rsync for Configuration Sync

/usr/local/bin/sync_nginx_config.sh
#!/bin/bash

PRIMARY_SERVER="primary.example.com"
BACKUP_SERVERS=("backup1.example.com" "backup2.example.com")

# Sync configuration files
rsync -avz --delete /etc/nginx/ $PRIMARY_SERVER:/etc/nginx/

for server in "${BACKUP_SERVERS[@]}"; do
    rsync -avz --delete /etc/nginx/ $server:/etc/nginx/
    ssh $server "nginx -t && systemctl reload nginx"
done

Automated Configuration Sync with inotify

/usr/local/bin/watch_nginx_config.sh
#!/bin/bash

inotifywait -m -e modify,create,delete /etc/nginx/ |
while read path action file; do
    echo "Detected change: $file"
    /usr/local/bin/sync_nginx_config.sh
done

Session Persistence in Load Balanced Environments

When using multiple active nodes, maintain user sessions during failover.

Sticky Sessions with IP Hash

/etc/nginx/conf.d/load-balancer.conf
upstream backend {
    ip_hash;
    server 192.168.1.10:80;
    server 192.168.1.11:80;
    server 192.168.1.12:80;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

External Session Storage with Redis

Using Redis for Session Storage
upstream backend {
    hash $cookie_jsessionid;
    server 192.168.1.10:80;
    server 192.168.1.11:80;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        
        # Session affinity
        set $session_sticky 1;
    }
}

Monitoring and Alerting

Implement comprehensive monitoring to detect issues before they cause outages.

Custom Nginx Monitoring Script

/usr/local/bin/monitor_nginx_ha.sh
#!/bin/bash

VIRTUAL_IP="192.168.1.100"
PRIMARY_SERVER="192.168.1.10"
BACKUP_SERVER="192.168.1.11"

check_server() {
    local server=$1
    if curl -s --connect-timeout 5 "http://$server/health" | grep -q "healthy"; then
        return 0
    else
        return 1
    fi
}

# Check virtual IP accessibility
if ! ping -c 1 -W 1 $VIRTUAL_IP >/dev/null 2>&1; then
    echo "ALERT: Virtual IP $VIRTUAL_IP is not accessible"
    # Send alert via email, Slack, etc.
fi

# Check individual servers
if ! check_server $PRIMARY_SERVER; then
    echo "ALERT: Primary server $PRIMARY_SERVER is down"
fi

if ! check_server $BACKUP_SERVER; then
    echo "ALERT: Backup server $BACKUP_SERVER is down"
fi

Testing Your Failover Setup

Regular testing ensures your failover mechanism works when needed.

Manual Failover Test

Testing Failover Manually
# Stop Nginx on primary to trigger failover
sudo systemctl stop nginx

# Monitor virtual IP movement
ping 192.168.1.100

# Check which server now has the virtual IP
ip addr show eth0 | grep 192.168.1.100

# Restore primary and verify failback
sudo systemctl start nginx

Automated Failover Testing

/usr/local/bin/test_failover.sh
#!/bin/bash

echo "Starting failover test..."

# Simulate primary failure
ssh primary-server "sudo systemctl stop nginx"

# Wait for failover
sleep 10

# Verify backup is serving traffic
if curl -f http://192.168.1.100/ >/dev/null 2>&1; then
    echo "SUCCESS: Failover completed successfully"
else
    echo "FAILURE: Failover failed"
    exit 1
fi

# Restore primary
ssh primary-server "sudo systemctl start nginx"
echo "Test completed"

Common Pitfalls

Split-brain scenario: When both servers think they're primary, caused by network partitions
Insufficient health checks: Only checking if Nginx process exists, not if it's actually serving requests
Session data loss: Not implementing shared session storage for stateful applications
DNS caching: Clients caching DNS records and not failing over to the new IP
Asymmetric configurations: Differences in configuration between primary and backup servers
Inadequate monitoring: Not detecting when failover occurs or when servers are unhealthy
No failback testing: Never testing the process of returning to the primary server after repair

warning

Always test your failover during maintenance windows. Unexpected behavior during actual failures can lead to extended downtime. Document the failover process and train your team on manual intervention procedures.

Summary

You've learned to build a robust high availability Nginx setup using Keepalived for IP failover, comprehensive health checks, and shared configuration management. Remember that high availability is not just about technology—it requires regular testing, monitoring, and well-documented procedures. Your HA setup should be as simple as possible while meeting your availability requirements.

Nginx High Availability and Load Balancing Concepts

What is the main difference between active-passive and active-active high availability configurations?

Question 1/5

Understanding High Availability Concepts​

Active-Passive vs Active-Active​

Implementing Active-Passive Nginx with Keepalived​

Installing Keepalived​

Configuring Keepalived on Primary Server​

Configuring Keepalived on Backup Server​

Health Check Configuration​

Nginx Status Page​

Advanced Health Check Script​

Shared Storage for Configuration and Content​

Using rsync for Configuration Sync​

Automated Configuration Sync with inotify​

Session Persistence in Load Balanced Environments​

Sticky Sessions with IP Hash​

External Session Storage with Redis​

Monitoring and Alerting​

Custom Nginx Monitoring Script​

Testing Your Failover Setup​

Manual Failover Test​

Automated Failover Testing​

Common Pitfalls​

Summary​