SRM Site Recovery Manager best practices ~ Angels Technology

SRM Site Recovery Manager best practices

These are my notes for what I thought were important from the Site Recovery Manager 5.0 Performance and Best Practices white paper. This is for my review and notes that I think I should go over and remember later. Good stuff here.

This white paper combines the some of the vmware recommendations and best practices to achieve the fastest or at least quicker recovery for your SRM tests or actual outage recovery. One important tip at the end of the paper is where they tell you that if you dont have vmware tools and shutdown guest time is set to a non zero value your recovery wont proceed.. I haven't tested that but that is something you DONT want to find out in a live fire type exercise.

Why it is good to understand SRM:

Vmware's SRM tool is a great DR tool for ensuring that critical VMs are available in case a site goes down.

Available here : http://www.vmware.com/files/pdf/techpaper/srm5-perf.pdf

Overview:

SRM requires a protected site and a recovery site
SRM server must be installed at each site.
Each site must be managed by its own vCenter Server.
Protected site is the source. Recovery site is the destination
protection group, which is a group of virtual machines that fail over together.
SRM supports two forms of replication: ABR and HBR aka VR
array-based replication(ABR) in which the storage subsystem manages VM replication,
host-based replication (vSphere replication)in which ESXi manages VM replication.
SRM supports 500 VMs for VR. 1000 for ABR
vSphere replication (VR) replicates only the most recent data in changed disk
ABR requirement for having identical storage arrays across sites.
SRM automatically discovers datastores for array-based replication between the sites.
Srm does not impose similar hardware requirements across both sites. You can have a different number of ESXi hosts at protected and recovery sites.
(VRMS ):: vSphere Replication Management Server
VR: vsphere replication
after executing a planned migration and a reprotect operation, your primary site will become your secondary site
Inventory mappings :associations between resource pools, virtual machine folders, networks at the

protected site and their destination counterparts at the recovery site.

Performance Considerations

Recovery Point Objective (RPO):defines the point in time at which data must be restored to meet service level agreements.
Recovery Time Objective (RTO) duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
ABR:array-based replication, RPO is fulfilled by the replication schedules configured on storage array.
VR: vSphere replication, set RPO using SRM plugin . VR minimum RPO is 15 minutes.
The VR algorithm adjusts the replication schedule dynamically in order to fulfill the RPO.

Server config and other stuff

Vmware recommend that the SRM database be installed as close to the SRM server as possible, such that it reduces the round-trip time between both of them.

You can use the same database server to support the vCenter database instance and the Site Recovery

Manager database instance,

good practice to maintain separate database servers for eachdatabase instance.

Creating a recovery plan

recovery plan is the complete set of steps needed to recover (or test the recovery of) the

protected VMs in one or more protection groups

Ÿ Planned migration SRM attempts to shut down protection site VMs and replicate outstanding changes to the recovery site before proceeding with the failover sequence.

Ÿ Unplanned migration SRM proceeds directly with the failover sequence without attempting to shut down protection site VMs and replicate changes to the recovery site.

Ÿ Reprotect (ABR only) Reprotect involves a reversal of direction of replication, and automatic reprotection of protection groups.

good practice to have fewer but larger NFS volumes so that the time taken to mount a large number

ofsuch volumes decreases during the recovery.

good practice to have vSphere DRS enabled on the recovery site. SRM 5.0 leverages DRS to reserve

sufficient resources during the recovery []power on all VMs.

Recovery time iscsi/fc faster than nfs
boot storm, these latencies could increase as a result of I/O bottlenecks you can edit defaultMaxBootAndShutdownOpsPerCluster
chart out dependencies and priorities between virtual machines to be recovered so that only a certain number of required virtual machines are assigned individual dependencies ,which impact recovery time.
Suspending virtual machines on the recovery site impacts recovery time.

*planned migration takes longer than unplanned. Clean shutdown takes time

Advanced Settings/ vmware tools

VMware strongly recommends that VMware Tools be installed in all protected virtual machines. Many SRM recovery operations depend on the proper installation of VMware Tools
With tools: SRM Wait for OS heartbeat while powering on the virtual machine and wait for a network change while reconfiguring the recovered virtual machine.
With Tools: SRM Wait for virtual machines to shut down on the protected site.
With Tools: tries to gracefully shut down the virtual machines onthe protected site.
SRM depends on VMware Tools to report the OS heartbeatand completion of the network change.
If NO TOOLS: choose to set the timeout values for recovery.powerOnTimeout and recovery.customizationTimeout to zero (0).
**** [IF NO TOOLS AND Shutdown shutdownguesttimeout is NONZERO] then recovery will not proceed beyond the “Shutdown VMs at Protected Site” step. MUST set recovery.skipGuestShutdown to true if you want your recovery plan to make any progress.
Normal vm swap file on datastore with vm. With SRM prevent swap files from being replicated, create

them on a non-replicated datastore. This reduce recovery time

Angels Technology

Saturday, July 14, 2012

SRM Site Recovery Manager best practices

0 comments:

Post a Comment

Popular Posts

Categories

Blog Archive

About Me