With today’s IT infrastructures, be it virtual or physical, disaster recovery has importance. Any business should be able to continue operating with reduced downtime for its sustainability amongst the competition. Two of the major factors used to service are High Availability and Recoverability.
Recoverability is the guarantee that the service offered and its data are protected against failures, and High Availability is the guarantee that the service offered would remain operational and the failures are handled in a way that the user of the service would not even know that there was a failure. There are many ways in which businesses plan and implement disaster recovery.
When we talk about High Availability and Recoverability these two factors are important to discuss – Recovery Point Objective (RPO) and Recovery Time Objective (RTO).
What are RPO and RTO?
RPO defines the amount of data an organization can afford to lose when measured against time. RTO defines the amount of downtime the organization can afford for its services before it becomes operational again.
Both RPO and RTO are defined by time. For example, an organization can have an RPO set to 4 hours and RTO set to 1 hour. This means, it can afford to lose up to 4 hours of data, but it can only afford a service downtime up to 1 hour.
RTO only defines the amount of time a service can remain unavailable but doesn’t account for the data loss. But RPO defines how much data loss can be afforded.
For example RPO is 4 hours, RTO is 2 hours and WRT is 30 minutes and disaster occurs at 10 am so we need to restore data by point of 6 am and recover the site by 12 pm. And then WRT starts for checking the servers integrity site must be fully functional by 12:30 pm. MTD in this case in 2:30 hours.
WRT (Work Recovery Time) – Determines the maximum tolerable amount of time that is needed to verify the system and/or data integrity. This could be, for example, checking the databases and logs, making sure applications or services are running and are available.
MTD (Maximum Tolerable Downtime) = RTO + WRT.
There are still so many companies following manual /Traditional way for BC/DR. To orchestrate this VMware has product called Site Recovery Manager.
Site Recovery Manager (SRM)
Site Recovery Manager (SRM) is an orchestration software that is used to automate disaster recovery testing and Failover. It can be configured to leverage either vSphere Replication or a supported array-based replication. SRM is a not a product that performs an automatic Failover, which means that there is no intelligence built into SRM that would detect a disaster/outage and Failover the VMs. The disaster recovery process should be manually initiated. Hence, it is not a high availability solution; it is purely a tool that orchestrates a Recovery Plan.
SRM requires both the protected and recovery sites to be managed by separate instances of the vCenter Server. It also requires an SRM instance at both the sites. SRM’s functionalities are currently only available via the vSphere Client and not the vSphere Web Client. Hence, an SRM plugin needs to be installed on the same machine where the vSphere Client is installed.
Refer to the following figure:
vCenter SRM has to be installed at both the protected and recovery sites for the disaster recovery setup to work. The installation process is identical regardless of the site it is being installed on; the only difference is that at each site, you will be registering the SRM installation to the vCenter Server managing that site. SRM can either be installed on the same machine that has vCenter Server installed or on a different machine. The decision to choose either one of the installation models depends on how you want to size or separate the service-providing machines in your infrastructure. The most common deployment model is to have both vCenter and SRM on the same machine. The rationale behind this is that SRM will not work in a standalone mode; this means that if your vCenter Server goes down, there is no way you could access SRM. Like vCenter Server, SRM can be installed on a physical or virtual machine. Another factor that you must take into account is the installation of SRA. SRAs have to be installed on the same machine where you already have SRM installed. Some SRAs need a reboot after installation. So, it is important to read through the storage vendor’s documentation prior to proceeding to make a deployment choice for SRM. If the vCenter downtime is not feasible, then you will have to consider installing SRM on a separate machine.
Note:- SRA only required when you are using Array-based Replication to replicate from Protected Site to Recovery Site.
1. Download SRM installer from VMware site and double click .EXE file to start installation process
10. Make a selection of your choice and click on Next to continue. Here, I have chosen to let the installer generate a new certificate. Use the second option if you already have a certificate file from your certificate authority. VMware recommends using CA-signed certificates for all its products.
When all these components are put together, a paired site protected by SRM will look as below:-
Note :- I have used few pictures in this post from SRM book written by Abhilash GB and would like to Thank him for this 🙂