XReplicator Backup, Disaster Recovery, and Ransomware-Resilient Restore

Overview

XReplicator v1.3.1 provides an Azure-to-Azure disaster recovery workflow from the web UI. Operators enable DR for source disks, keep DR-side staging disks in sync, validate readiness with precheck, trigger Azure VM failover from a blueprint, and then use the failback workflow to return data to the primary VM when the incident is over.

Measured Azure-to-Azure DR drills on a VM with a 30 GB OS disk and a 4 GB data disk completed failover in under 40 seconds when using attach-as-is disks and around 70 seconds when creating disks from snapshots. These numbers assume DR staging disks are already healthy and synced. Actual RTO depends on Azure API latency, VM size, OS boot time, networking, and application startup.

Architecture

The Azure DR flow uses these components:

Source agents protect Windows and Linux VM disks through block-level backups.
Backup server stores snapshots and coordinates DR source state.
DR staging disks are raw Azure disks attached to the DR-side environment and kept current by XReplicator.
Web UI owns cloud targets, blueprints, precheck, failover, failback, and operation history.
Azure target config stores tenant, subscription, region, resource group, VNet, subnet, VM size, and credentials.

Prerequisites

Before enabling DR, confirm:

The backup server and web UI run in the Azure DR landing zone.
Source agents are connected and completing backups.
Raw target disks are attached to the DR-side environment and are at least as large as their source disks.
Target disks are not mounted, read-only, or already assigned to another DR source.
The Azure service principal can create snapshots, disks, NICs, and VMs in the target resource group and read the target network.
The target VNet, subnet, route tables, NSGs, and DNS/cutover plan are ready.

DR Page Controls

The DR page is organized into these tabs:

Tab	Purpose
Sources	Enable or disable DR per protected source disk, monitor sync state, retry backfill, and run health checks.
Targets	Store Azure target configuration such as tenant, subscription, resource group, region, VNet, subnet, and VM size choices.
Blueprints	Map one source VM’s OS and data disks to a target cloud config and disk strategy.
Failover	Run precheck, review readiness, trigger failover, and follow row-level logs.
Failback	Prepare mappings from the DR VM back to the primary VM, check primary disks, sync data back, and complete failback.
History	Review previous failover and failback operations with timestamps, status, and execution details.

Enable DR for Source Disks

Open DR > Sources.
Locate each source VM disk that must be protected.
Click Enable DR.
If this is the first time the source is mapped to a DR target disk, choose how to prepare the newly mapped target disk:
- Wipe and zero removes filesystem signatures and zeros the beginning and end of the target disk before DR sync starts.
- Continue without wiping leaves existing signatures in place and starts sync without disk preparation.
Wait for the source to become healthy.
Use Retry backfill if the initial sync fails after the underlying problem is fixed.
Use Check health to verify the source can still reconcile to its desired snapshot.

Use Wipe and zero for newly attached raw DR disks unless you have already prepared the disk out of band. Never select a disk that contains data you need to keep.

Configure Azure Targets

Open DR > Targets.
Add an Azure target.
Enter tenant ID, client ID, client secret, subscription ID, target region, target resource group, network resource group, VNet, subnet, and default VM size options.
Save the target and confirm it is available when creating a blueprint.

Keep service principal permissions scoped to the DR resource groups wherever possible.

Create a Blueprint

Open DR > Blueprints.
Create a blueprint for one protected VM.
Select the source client and all disks that make up the VM.
Map the row to the Azure target config, region, resource group, network resource group, VNet, subnet, VM size, and optional hostname override.
Choose OS and data disk strategies:
- Attach as-is uses already-synced DR disks directly. This is the fastest measured path.
- Create from snapshot creates Azure snapshots and new managed disks before VM creation. This is useful when you want the failover VM to run from isolated copies.
Save the blueprint.

For Windows Server OS disks, prefer the specialized OS-disk attach path from a snapshot-derived managed disk rather than treating the disk as a generalized Sysprep image.

Precheck and Readiness

Open DR > Failover, select the blueprint, and run precheck before every drill or production failover. Precheck should pass only when:

DR is enabled for every selected source disk.
Each source is healthy.
Applied snapshots match desired snapshots.
The Azure target config is complete.
The blueprint row has region, resource group, network, VM size, OS strategy, and data strategy.

Do not trigger failover while sources are pending, backfilling, verifying, or degraded unless you intentionally accept an older applied restore point.

Trigger Failover

Open DR > Failover.
Select the blueprint.
Run precheck and review warnings or blockers.
Click Trigger failover.
Confirm the operation.
Watch row-level logs for source selection, OS disk handling, data disk handling, NIC creation, VM creation, and completion.

Expected logs include messages such as:

Resolving the OS source disk.
Creating an OS snapshot or reusing an existing DR OS disk.
Creating data disk snapshots or attaching existing DR disks.
Creating a NIC in the configured subnet.
Creating the recovered Azure VM.
Completing the row for the target VM.

Validate the DR VM

After failover completes:

Confirm the VM boots cleanly.
Confirm OS and data disks are present and mounted as expected.
Confirm networking, NSGs, routes, DNS, and application dependencies.
Run application health checks.
Record VM-created time, OS-ready time, and application-ready time separately.
Keep the failover operation in history; failback uses that context.

Failback Flow

Use failback after the primary environment is ready to receive data again.

Open DR > Failback.
Select the completed failover operation for the blueprint.
Click Prepare failback so XReplicator resolves the DR VM sources and maps them back to the original primary sources.
Review mappings carefully. DR VM disks should be treated as failover DR sources, not new primary sources.
Run Check primary disks.
Resolve blockers before syncing. Data disks should be unmounted, quiesced, or otherwise safe to overwrite on the primary side.
Start failback sync.
Monitor progress and logs.
Complete failback only after validation passes.

For OS disks, avoid live overwrite of a running primary OS disk. Use the product’s blocked/rescue-mode guidance for OS disk recovery when needed.

RTO Notes

The measured drill numbers are useful for planning but are not guarantees:

Strategy	Measured result	Notes
Attach as-is	Under 40 seconds	Fastest path because the DR disks are already synced and can be attached directly.
Create from snapshot	Around 70 seconds	Adds Azure snapshot and managed disk creation before VM creation.

The measured workload was one VM with a 30 GB OS disk and a 4 GB data disk. For production planning, measure your own RTO using the same VM size, disk count, region, network, and application startup procedure you will use during an incident.

Operational Checklist

Run at least one failover and failback drill per protected workload.
Record attach-as-is and snapshot-based RTO separately.
Keep source, target, and blueprint naming consistent.
Keep DR target disks raw and dedicated to XReplicator.
Rotate Azure credentials and keep permissions least-privilege.
Document DNS, load balancer, firewall, and application validation steps.
Review operation history after every drill and fix recurring warnings.

Current Scope

Supported in v1.3.1:

Azure-to-Azure DR setup.
DR source enablement and staging sync.
Target disk preparation prompt.
Strict precheck.
Azure target configuration.
Blueprint-driven Azure failover.
Attach-as-is and snapshot-based disk strategies.
Failback preparation, primary disk checks, sync monitoring, and operation history.

AWS and GCP remain documented as cold DR/manual runbook paths unless provider orchestration is added later.