Skip to Content
Azure-to-Azure DR

Overview

XReplicator v1.3.0 can be used to set up disaster recovery for Azure workloads recovering into Azure. The web UI orchestrates the DR flow: source disks are kept in sync, readiness is verified through precheck, and a failover blueprint is used to create the recovered Azure VM in the target landing zone.

In observed Azure-to-Azure failover drills, the recovered server can be started in under 2 minutes once staging disks are already healthy and synced. Actual RTO depends on Azure control-plane latency, VM size, OS boot time, networking, and application startup.

When to Use This

Use this flow when:

  • The protected workload runs on Azure.
  • The DR landing zone is also Azure.
  • You want XReplicator to manage DR readiness and failover orchestration.
  • You can keep staging disks attached to the DR-side environment and updated by XReplicator.

For AWS or GCP, use the cloud-specific cold DR runbooks until provider-level orchestration is added.

Architecture

The v1.3.0 Azure-to-Azure flow uses these components:

  • Source VM agents back up OS and data disks.
  • Backup server stores snapshots and coordinates DR state.
  • DR manager keeps target staging disks synced to the desired snapshot.
  • Web UI manages sources, cloud targets, blueprints, precheck, and failover.
  • Azure target config stores the target subscription, region, resource group, network, and service principal details.

Setup Flow

  1. Install and start the XReplicator backup server in the Azure DR landing zone.
  2. Install agents on the Azure source VMs.
  3. Configure backup schedules and confirm snapshots are completing.
  4. Open the web UI and go to DR.
  5. Enable DR for each source disk that must be protected.
  6. Wait for each source to become healthy.
  7. Add an Azure cloud target with tenant, client, secret, subscription, and region.
  8. Create a DR blueprint for the workload.
  9. Select the source OS and data disks.
  10. Map the target resource group, VNet, subnet, VM size, and disk strategy.
  11. Run precheck.
  12. Trigger failover only when precheck is clean.

Readiness Gate

Precheck should pass only when:

  • DR is enabled for every selected source disk.
  • Source status is healthy.
  • The applied snapshot matches the desired snapshot.
  • The target Azure config is healthy.
  • Blueprint rows include region, resource group, VNet, subnet, VM size, OS disk strategy, and data disk strategy.

Do not trigger failover while a source is pending, backfilling, verifying, or degraded.

Failover Behavior

During failover, XReplicator uses the blueprint to:

  • Resolve the protected OS and data disks.
  • Create Azure snapshots or managed disks based on the selected strategy.
  • Create a network interface in the configured VNet/subnet.
  • Create the recovered Azure VM with the selected size.
  • Attach recovered data disks.
  • Record operation history and row-level execution logs.

RTO Notes

The under-2-minute RTO observation applies to an Azure-to-Azure drill where DR staging disks were already synced before failover. For production planning, measure your own RTO under realistic conditions:

  • VM size and disk count.
  • OS boot time.
  • Azure API response time in the selected region.
  • Network security rules and route propagation.
  • Application startup and dependency checks.
  • Manual approval or DNS/load-balancer cutover steps.

Operational Checklist

  • Run at least one failover drill per protected workload.
  • Record the VM boot time and application-ready time separately.
  • Keep Azure service principal credentials rotated and scoped to the DR resource groups they manage.
  • Confirm staging disks are not mounted by other workloads.
  • Keep a written cutover plan for DNS, load balancers, and application-specific validation.
  • Treat failback as a separate controlled reverse-replication operation.

Current Scope

Supported in v1.3.0:

  • Azure-to-Azure DR setup.
  • DR source enablement and staging sync.
  • Strict precheck.
  • Azure target configuration.
  • Blueprint-driven Azure failover.
  • Operation history and row-level logs.

Not yet the primary path:

  • AWS/GCP failover orchestration.
  • Automated failback.
  • Application-specific quiesce hooks.

Use the manual cloud runbooks for provider flows that are not yet orchestrated.

Last updated on