Skip to main content

Monitoring & Troubleshooting in Redis

Redis is renowned for its speed and simplicity, but as with any production system, it’s crucial to monitor its health and swiftly troubleshoot issues. This lesson provides a hands-on guide to monitoring Redis, interpreting metrics, setting up alerts, and diagnosing common problems before they impact your applications.


Table of Contents

  1. Introduction
  2. Why Monitoring Matters
  3. Core Redis Monitoring Tools
  4. Key Metrics to Monitor
  5. Setting Up Alerts
  6. Troubleshooting Common Issues
  7. Common Mistakes and Pitfalls
  8. Summary
  9. Quiz

Introduction

Redis can serve as the backbone of real-time applications, but undetected issues—like memory leaks, blocked clients, or replication lag—can quickly escalate into outages or data loss. This lesson focuses on proactive monitoring and systematic troubleshooting to maintain a reliable Redis deployment.


Why Monitoring Matters

Monitoring provides visibility into Redis's performance and operational health. Effective monitoring helps you:

  • Detect anomalies before they escalate
  • Optimize resource usage
  • Ensure high availability and performance
  • Reduce downtime and data loss

Core Redis Monitoring Tools

1. The INFO Command

Redis exposes internal statistics via the INFO command.

127.0.0.1:6379> INFO

You can request specific sections:

127.0.0.1:6379> INFO memory
127.0.0.1:6379> INFO stats
127.0.0.1:6379> INFO clients

Example Output (partial):

# Memory
used_memory:1048576
used_memory_rss:2097152
mem_fragmentation_ratio:2.00

# Clients
connected_clients:10
blocked_clients:0

2. Redis CLI MONITOR Command

The MONITOR command streams every command processed by the server in real time; useful for debugging or auditing.

127.0.0.1:6379> MONITOR

Note: This is very resource-intensive! Do not use in production for extended durations.

3. Redis Logs

Redis logs are invaluable for tracking server warnings, restarts, or persistence failures. Check the location in your redis.conf (logfile directive).

4. External Monitoring Systems

  • Prometheus & Grafana: Use the redis_exporter for Prometheus metrics, then visualize in Grafana.
  • Cloud Monitoring: AWS ElastiCache, Azure Cache, and GCP Memorystore offer dashboards and alerts.
  • Third-party SaaS: DataDog, New Relic, etc., offer Redis integrations.

Example: Exporting Metrics to Prometheus

  1. Run redis_exporter:
    ./redis_exporter
  2. Add target to Prometheus configuration:
    - job_name: 'redis'
    static_configs:
    - targets: ['localhost:9121']

Key Metrics to Monitor

MetricWhat It MeansWhy It Matters
used_memoryTotal memory allocated by RedisDetect leaks, plan scaling
connected_clientsNumber of active connectionsCapacity, possible overload
blocked_clientsClients waiting on blocking commandsMay indicate performance issue
instantaneous_ops_per_secNumber of ops executed per secondThroughput, sudden traffic
rdb_last_bgsave_statusLast RDB save statusData durability
aof_last_write_statusLast AOF write statusData durability
rejected_connectionsConnections rejected due to limitsMay need to tune limits
keyspace_hits, keyspace_missesLookup effectivenessApplication efficiency
sync_full, sync_partial_ok, sync_partial_errReplication healthReplica status

Example: Fetching Key Metrics via Python

import redis

r = redis.Redis(host='localhost', port=6379)
info = r.info()

print("Memory Used:", info['used_memory_human'])
print("Connected Clients:", info['connected_clients'])
print("Ops/sec:", info['instantaneous_ops_per_sec'])

Setting Up Alerts

Proactive alerting helps you catch and react to problems early.

Example Thresholds

  • Memory usage > 80%
  • Connected clients > 90% of maxclients
  • Blocked clients > 0 for > 1 minute
  • Replication lag > 5 seconds
  • Persistence failures

Example: Prometheus Alert Rule

- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage is above 80%"

Troubleshooting Common Issues

1. High Memory Usage

Symptoms: Slow responses, OOM errors, evictions.

Troubleshooting Steps:

  • Check used_memory, maxmemory in INFO memory
  • Identify large keys or key patterns:
    127.0.0.1:6379> MEMORY USAGE mykey
    127.0.0.1:6379> MEMORY STATS
  • Use Redis modules (like redis-memory-analyzer).

Remediation: Adjust data model, apply eviction policy, increase memory.


2. High Latency or Slow Commands

Symptoms: Commands take too long, timeouts, blocked clients.

Troubleshooting Steps:

  • Check slowlog:
    127.0.0.1:6379> SLOWLOG GET 5
  • Review blocked_clients in INFO clients
  • Identify slow command patterns.

Remediation: Optimize queries, use pipelining, avoid blocking commands.


3. Replication Lag

Symptoms: Data on replicas lags behind master.

Troubleshooting Steps:

  • Check slave_repl_offset, master_repl_offset in INFO replication
  • Monitor lag in your metrics.

Remediation: Increase network throughput, tune repl-backlog-size, avoid heavy writes.


4. Persistence Failures

Symptoms: RDB or AOF saves are failing.

Troubleshooting Steps:

  • Check rdb_last_bgsave_status and logs for errors.
  • Check disk space and permissions.

Remediation: Free up disk, fix permissions, review config.


Common Mistakes and Pitfalls

  • Ignoring Slowlog: Failing to monitor slow commands can hide performance bottlenecks.
  • Overusing MONITOR: Running MONITOR long-term in production can degrade performance.
  • Alert Fatigue: Too many alerts lead to ignored warnings; tune thresholds.
  • Not Monitoring Replication Lag: Can result in silent data inconsistency in failover.
  • Blind Spot for Memory Fragmentation: High mem_fragmentation_ratio can waste memory unexpectedly.

Summary

  • Monitoring is vital for Redis reliability and stability.
  • Use built-in commands, logs, and external systems for effective monitoring.
  • Track key metrics and set actionable alert thresholds.
  • Systematic troubleshooting helps address memory, latency, replication, and persistence issues.
  • Avoid common monitoring and troubleshooting pitfalls.
  • Integrate monitoring and alerting with your operational playbook.

Quiz

  1. Which Redis command provides detailed server statistics and metrics?

    • A) SLOWLOG
    • B) MONITOR
    • C) INFO
    • D) CONFIG

    Answer: C) INFO

  2. What does a non-zero blocked_clients metric typically indicate?

    • A) Clients are idle
    • B) Clients are waiting on blocking commands
    • C) Clients are disconnected
    • D) Clients have exceeded maxmemory

    Answer: B) Clients are waiting on blocking commands

  3. Why should you avoid running the MONITOR command for long periods in production?

    • A) It disables persistence
    • B) It is resource-intensive and can impact server performance
    • C) It deletes keys in real time
    • D) It resets all statistics

    Answer: B) It is resource-intensive and can impact server performance

  4. Which metric indicates the effectiveness of your key lookups in Redis?

    • A) used_memory
    • B) keyspace_hits and keyspace_misses
    • C) connected_clients
    • D) rdb_last_bgsave_status

    Answer: B) keyspace_hits and keyspace_misses

  5. What is a common cause of replication lag in Redis?

    • A) Too many slowlog entries
    • B) Network congestion or heavy write workload
    • C) High keyspace_hits
    • D) Low memory usage

    Answer: B) Network congestion or heavy write workload


Continue to the next lesson to learn about advanced Redis operations and tooling.