SRM Site Recovery Manager best practices
- SRM requires a protected site and a recovery site
- SRM server must be installed at each site.
- Each site must be managed by its own vCenter Server.
- Protected site is the source. Recovery site is the destination
- protection group, which is a group of virtual machines that fail over together.
- SRM supports two forms of replication: ABR and HBR aka VR
- array-based replication(ABR) in which the storage subsystem manages VM replication,
- host-based replication (vSphere replication)in which ESXi manages VM replication.
- SRM supports 500 VMs for VR. 1000 for ABR
- vSphere replication (VR) replicates only the most recent data in changed disk
- ABR requirement for having identical storage arrays across sites.
- SRM automatically discovers datastores for array-based replication between the sites.
- Srm does not impose similar hardware requirements across both sites. You can have a different number of ESXi hosts at protected and recovery sites.
- (VRMS ):: vSphere Replication Management Server
- VR: vsphere replication
- after executing a planned migration and a reprotect operation, your primary site will become your secondary site
- Inventory mappings :associations between resource pools, virtual machine folders, networks at the
- Recovery Point Objective (RPO):defines the point in time at which data must be restored to meet service level agreements.
- Recovery Time Objective (RTO) duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
- ABR:array-based replication, RPO is fulfilled by the replication schedules configured on storage array.
- VR: vSphere replication, set RPO using SRM plugin . VR minimum RPO is 15 minutes.
- The VR algorithm adjusts the replication schedule dynamically in order to fulfill the RPO.
- good practice to have fewer but larger NFS volumes so that the time taken to mount a large number
- good practice to have vSphere DRS enabled on the recovery site. SRM 5.0 leverages DRS to reserve
- Recovery time iscsi/fc faster than nfs
- boot storm, these latencies could increase as a result of I/O bottlenecks you can edit defaultMaxBootAndShutdownOpsPerCluster
- chart out dependencies and priorities between virtual machines to be recovered so that only a certain number of required virtual machines are assigned individual dependencies ,which impact recovery time.
- Suspending virtual machines on the recovery site impacts recovery time.
- VMware strongly recommends that VMware Tools be installed in all protected virtual machines. Many SRM recovery operations depend on the proper installation of VMware Tools
- With tools: SRM Wait for OS heartbeat while powering on the virtual machine and wait for a network change while reconfiguring the recovered virtual machine.
- With Tools: SRM Wait for virtual machines to shut down on the protected site.
- With Tools: tries to gracefully shut down the virtual machines onthe protected site.
- SRM depends on VMware Tools to report the OS heartbeatand completion of the network change.
- If NO TOOLS: choose to set the timeout values for recovery.powerOnTimeout and recovery.customizationTimeout to zero (0).
- **** [IF NO TOOLS AND Shutdown shutdownguesttimeout is NONZERO] then recovery will not proceed beyond the “Shutdown VMs at Protected Site” step. MUST set recovery.skipGuestShutdown to true if you want your recovery plan to make any progress.
- Normal vm swap file on datastore with vm. With SRM prevent swap files from being replicated, create
These are my notes for what I thought were important from the Site Recovery Manager 5.0 Performance and Best Practices white paper. This is for my review and notes that I think I should go over and remember later. Good stuff here.
This white paper combines the some of the vmware recommendations and best practices to achieve the fastest or at least quicker recovery for your SRM tests or actual outage recovery. One important tip at the end of the paper is where they tell you that if you dont have vmware tools and shutdown guest time is set to a non zero value your recovery wont proceed.. I haven't tested that but that is something you DONT want to find out in a live fire type exercise.
This white paper combines the some of the vmware recommendations and best practices to achieve the fastest or at least quicker recovery for your SRM tests or actual outage recovery. One important tip at the end of the paper is where they tell you that if you dont have vmware tools and shutdown guest time is set to a non zero value your recovery wont proceed.. I haven't tested that but that is something you DONT want to find out in a live fire type exercise.
Why it is good to understand SRM:
Vmware's SRM tool is a great DR tool for ensuring that critical VMs are available in case a site goes down.
Available here : http://www.vmware.com/files/pdf/techpaper/srm5-perf.pdf
Overview:
protected site and their destination counterparts at the recovery site.
Performance Considerations
Server config and other stuff
Vmware recommend that the SRM database be installed as close to the SRM server as possible, such that it reduces the round-trip time between both of them.
You can use the same database server to support the vCenter database instance and the Site Recovery
Manager database instance,
good practice to maintain separate database servers for eachdatabase instance.
Creating a recovery plan
recovery plan is the complete set of steps needed to recover (or test the recovery of) the
protected VMs in one or more protection groups
Planned migration SRM attempts to shut down protection site VMs and replicate outstanding changes to the recovery site before proceeding with the failover sequence.
Unplanned migration SRM proceeds directly with the failover sequence without attempting to shut down protection site VMs and replicate changes to the recovery site.
Reprotect (ABR only) Reprotect involves a reversal of direction of replication, and automatic reprotection of protection groups.
ofsuch volumes decreases during the recovery.
sufficient resources during the recovery []power on all VMs.
*planned migration takes longer than unplanned. Clean shutdown takes time
Advanced Settings/ vmware tools
them on a non-replicated datastore. This reduce recovery time
0 comments:
Post a Comment