Skip to Content
Operations

Daily Operations

Quick Health Check

# Check all service statuses sudo systemctl status backup-* # Check disk space df -h /var/lib/backup/repo # View recent snapshots xreplicator snapshots --server localhost:50051

Monitor these daily:

  • Service status for all components
  • Logs for errors or warnings
  • Disk space on the backup server
  • Confirmation that backups are completing

Backup Strategies

Full Backup Frequency

FrequencyRecommended For
DailyCritical systems with high change rates
WeeklyMost production systems (recommended)
MonthlyArchival or low-change systems

Incremental Backup Frequency

FrequencyRecommended For
HourlyCritical databases, high-change systems
Every 2–4 hoursTypical production systems
DailyLow-change systems

Retention Policy Guidelines

  • Full backups: Keep 4–12 (1–3 months of history)
  • Incremental backups: Keep 24–168 (1–7 days of hourly snapshots)
  • Adjust based on storage capacity and recovery time objectives

Maintenance Schedule

Weekly

  • Review backup logs for errors
  • Check repository disk space
  • Verify cloud sync status (if configured)
  • Test a restore from a recent snapshot

Monthly

  • Review and adjust retention policies
  • Run compaction (if not automated)
  • Verify license expiry date
  • Review and update configurations

Quarterly

  • Perform a full disaster recovery test
  • Review and update backup strategies
  • Audit access and permissions
  • Update software packages

Key Metrics to Monitor

MetricAlert Threshold
Backup success/failure rateAny failure
Repository disk usage80% full
License expiry30 days before expiry
Agent connectivityOn disconnection
Cloud sync statusOn failure (if configured)

Best Practices

Configuration

  • Use a consistent fixed_block_size_mb across all agents and the server
  • Keep chunk_size_avg_kb consistent to preserve deduplication
  • Enable TLS for all production gRPC connections
  • Store cloud credentials securely — avoid hardcoding in config files

Security

  • Restrict network access to the backup server (port 50051)
  • Use TLS encryption for gRPC communication
  • Use strong, unique credentials for cloud storage
  • Rotate credentials on a regular schedule

Performance

  • Match pipeline settings (workers, batch_size, max_pipeline_memory_mb) to available resources
  • Use compression for network-backed storage
  • Enable eBPF change tracking for faster incremental backups
  • Monitor and tune batch sizes based on actual network throughput

Reliability

  • Test restores regularly — a backup you haven’t restored is an untested backup
  • Maintain multiple full backups
  • Use cloud sync for offsite/disaster recovery backups
  • Document your recovery procedures and keep them up to date

Documentation

  • Maintain a list of all backed-up systems and their schedules
  • Document the restore procedure for each system type
  • Keep copies of configuration files in version control
  • Record any configuration changes with the reason and date
Last updated on