Skip to main content

Backup and Disaster Recovery

In this lesson, we'll explore how to implement robust backup and disaster recovery strategies for your Grafana instance. As Grafana becomes central to your monitoring and observability workflows, ensuring its availability and data integrity is crucial for maintaining operational visibility.

Learning Goals

  • Understand what Grafana components need backup protection
  • Implement automated backup strategies
  • Configure disaster recovery procedures
  • Test and validate backup integrity
  • Monitor backup health and performance

Understanding Grafana's Data Architecture

Grafana stores different types of data across multiple components:

  • Configuration data (dashboards, datasources, users, organizations)
  • Application data (SQL database for settings, users, dashboards)
  • External dependencies (data sources, alerting systems)
note

Grafana itself doesn't store time series data - that remains in your data sources (Prometheus, InfluxDB, etc.). Your backup strategy should include those systems separately.

Backing Up Grafana Configuration

Database Backups

Grafana uses SQLite (default), MySQL, or PostgreSQL as its database. Regular database backups are essential:

backup-grafana-db.sh
#!/bin/bash
BACKUP_DIR="/opt/grafana/backups"
DATE=$(date +%Y%m%d_%H%M%S)
PGPASSWORD="$GRAFANA_DB_PASSWORD" pg_dump -h localhost -U grafana grafana_db > $BACKUP_DIR/grafana_backup_$DATE.sql

# Keep only last 30 days of backups
find $BACKUP_DIR -name "grafana_backup_*.sql" -mtime +30 -delete

Configuration Files

Back up your Grafana configuration files:

backup-config.sh
#!/bin/bash
BACKUP_DIR="/opt/grafana/backups"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup main configuration
cp /etc/grafana/grafana.ini $BACKUP_DIR/grafana.ini_$DATE

# Backup provisioning configurations
tar -czf $BACKUP_DIR/provisioning_$DATE.tar.gz /etc/grafana/provisioning/

# Backup plugins (if custom plugins exist)
tar -czf $BACKUP_DIR/plugins_$DATE.tar.gz /var/lib/grafana/plugins/

Automated Backup Strategies

Using Grafana API for Configuration Backup

The Grafana API provides programmatic access to export dashboards, datasources, and other configurations:

grafana-backup.ts
import axios from 'axios';

class GrafanaBackup {
private baseURL: string;
private apiKey: string;

constructor(baseURL: string, apiKey: string) {
this.baseURL = baseURL;
this.apiKey = apiKey;
}

async backupDashboards(): Promise<void> {
const response = await axios.get(`${this.baseURL}/api/search`, {
headers: { 'Authorization': `Bearer ${this.apiKey}` },
params: { type: 'dash-db' }
});

for (const dashboard of response.data) {
const dashboardData = await axios.get(
`${this.baseURL}/api/dashboards/uid/${dashboard.uid}`,
{ headers: { 'Authorization': `Bearer ${this.apiKey}` } }
);

// Save to file
const fs = require('fs');
fs.writeFileSync(
`backups/dashboards/${dashboard.uid}.json`,
JSON.stringify(dashboardData.data, null, 2)
);
}
}

async backupDataSources(): Promise<void> {
const response = await axios.get(`${this.baseURL}/api/datasources`, {
headers: { 'Authorization': `Bearer ${this.apiKey}` }
});

const fs = require('fs');
fs.writeFileSync(
'backups/datasources/datasources.json',
JSON.stringify(response.data, null, 2)
);
}
}

// Usage
const backup = new GrafanaBackup('http://localhost:3000', 'your-api-key');
await backup.backupDashboards();
await backup.backupDataSources();

Cron-based Automated Backups

Set up scheduled backups using cron:

/etc/cron.d/grafana-backup
# Daily full backup at 2 AM
0 2 * * * grafana /opt/grafana/scripts/full-backup.sh

# Incremental dashboard backups every 6 hours
0 */6 * * * grafana /opt/grafana/scripts/dashboard-backup.sh

# Weekly verification of backups
0 3 * * 0 grafana /opt/grafana/scripts/verify-backups.sh

Disaster Recovery Procedures

Complete Grafana Restoration

When disaster strikes, follow this recovery procedure:

restore-grafana.sh
#!/bin/bash
set -e

RESTORE_DATE=${1:-latest}
BACKUP_DIR="/opt/grafana/backups"

echo "Starting Grafana restoration from backup: $RESTORE_DATE"

# Stop Grafana service
systemctl stop grafana-server

# Restore database
if [ -f "$BACKUP_DIR/grafana_backup_$RESTORE_DATE.sql" ]; then
psql -U grafana -d grafana_db -f "$BACKUP_DIR/grafana_backup_$RESTORE_DATE.sql"
elif [ -f "$BACKUP_DIR/grafana_backup_$RESTORE_DATE.db" ]; then
cp "$BACKUP_DIR/grafana_backup_$RESTORE_DATE.db" /var/lib/grafana/grafana.db
fi

# Restore configuration
cp "$BACKUP_DIR/grafana.ini_$RESTORE_DATE" /etc/grafana/grafana.ini

# Restore provisioning
tar -xzf "$BACKUP_DIR/provisioning_$RESTORE_DATE.tar.gz" -C /

# Start Grafana
systemctl start grafana-server
systemctl status grafana-server

echo "Grafana restoration completed"

Individual Component Restoration

For partial restorations (e.g., single dashboard):

restore-dashboard.ts
import axios from 'axios';

async function restoreDashboard(backupFile: string, grafanaUrl: string, apiKey: string): Promise<void> {
const dashboardData = JSON.parse(await fs.promises.readFile(backupFile, 'utf8'));

const response = await axios.post(
`${grafanaUrl}/api/dashboards/db`,
{
dashboard: dashboardData.dashboard,
overwrite: true,
message: `Restored from backup ${new Date().toISOString()}`
},
{
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
}
);

console.log(`Dashboard restored: ${response.data.slug}`);
}

Testing Backup Integrity

warning

Never assume your backups work without testing! Regularly validate backup integrity to avoid unpleasant surprises during actual recovery scenarios.

Automated Backup Testing

test-backup.ts
import { execSync } from 'child_process';

class BackupTester {
testDatabaseBackup(backupFile: string): boolean {
try {
if (backupFile.endsWith('.sql')) {
// Test PostgreSQL backup
execSync(`pg_restore --list ${backupFile}`, { stdio: 'pipe' });
} else if (backupFile.endsWith('.db')) {
// Test SQLite backup
execSync(`sqlite3 ${backupFile} "SELECT count(*) FROM sqlite_master;"`, { stdio: 'pipe' });
}
return true;
} catch (error) {
console.error(`Backup test failed for ${backupFile}:`, error);
return false;
}
}

testDashboardBackups(backupDir: string): number {
const fs = require('fs');
const files = fs.readdirSync(backupDir);
let validCount = 0;

files.forEach(file => {
if (file.endsWith('.json')) {
try {
const content = JSON.parse(fs.readFileSync(`${backupDir}/${file}`, 'utf8'));
if (content.dashboard && content.dashboard.title) {
validCount++;
}
} catch (error) {
console.error(`Invalid dashboard backup: ${file}`);
}
}
});

return validCount;
}
}

Monitoring Backup Health

Create a dedicated dashboard to monitor your backup system:

backup-monitoring-dashboard.json
{
"dashboard": {
"title": "Backup System Monitoring",
"panels": [
{
"title": "Backup Success Rate",
"type": "stat",
"targets": [{
"expr": "sum(grafana_backup_success_total) / sum(grafana_backup_attempts_total) * 100",
"legendFormat": "Success Rate"
}]
},
{
"title": "Backup Duration",
"type": "graph",
"targets": [{
"expr": "grafana_backup_duration_seconds",
"legendFormat": "Backup Duration"
}]
},
{
"title": "Backup Age",
"type": "stat",
"targets": [{
"expr": "time() - grafana_last_successful_backup_timestamp",
"legendFormat": "Hours since last backup"
}]
}
]
}
}

Common Pitfalls

  • Incomplete backup scope: Remember to backup both database AND configuration files
  • No recovery testing: Backups are useless if you can't restore them - test regularly
  • Ignoring API tokens and secrets: API keys and secrets in configurations need protection
  • Single storage location: Store backups in multiple locations (local, cloud, offsite)
  • No backup monitoring: Implement alerts for backup failures and aging backups
  • Forgetting plugin data: Custom plugins and their configurations need backup too
  • Insufficient retention: Ensure backup retention matches your RPO (Recovery Point Objective)

Summary

A robust Grafana backup and disaster recovery strategy involves:

  • Regular database and configuration file backups
  • Automated backup processes with proper scheduling
  • Comprehensive testing of backup integrity
  • Clear restoration procedures for different scenarios
  • Monitoring and alerting for backup system health
  • Documentation and regular drills of recovery processes

Remember that your observability system is only as reliable as its ability to recover from failures.

Show quiz
  1. What are the two main types of data you should backup in Grafana?

    • Configuration data (dashboards, datasources) and application database
    • Only time series data from Prometheus
    • Just the Grafana binary and plugins
  2. Why is it important to test backup integrity regularly?

    • To ensure backups are actually usable during recovery
    • Because backups always work perfectly
    • Testing is optional for simple setups
  3. What critical component do many administrators forget to include in their Grafana backup strategy?

    • API tokens and secrets in configuration files
    • The Grafana version number
    • Browser cookies for dashboard access
  4. How can you programmatically backup Grafana dashboards?

    • Using the Grafana HTTP API with appropriate authentication
    • Copying the browser localStorage
    • Exporting screenshots of each dashboard
  5. What's the purpose of monitoring backup age and success rates?

    • To proactively identify backup failures and ensure RPO compliance
    • Because metrics look nice on dashboards
    • It's required for Grafana Enterprise licensing

Answers:

  1. Configuration data (dashboards, datasources) and application database
  2. To ensure backups are actually usable during recovery
  3. API tokens and secrets in configuration files
  4. Using the Grafana HTTP API with appropriate authentication
  5. To proactively identify backup failures and ensure RPO compliance