Disaster Recovery as a Service (DRaaS) burst on the scene about four years ago. However, most DRaaS solutions don’t exactly live up to expectations. Frequently built on backup, these are solutions that try to make recovery act like high availability. Traversing the timespan between data stored in a backup store to an actual running application can take hours. The cloud adds another variable, virtual machine (VM) transformation, which IT also needs to take into consideration, since VM transformation can take hours. Datrium recently announced a DRaaS capability that promises to live up to IT’s original expectations of what DRaaS should be.
When it comes to applications, disaster recovery almost always means using replication, not backup, to create the secondary copy at a disaster recovery site. Disaster recovery also means multiple recoveries and accounting for each application’s dependencies on other applications to provide necessary services and functionality. In most cases, during a disaster, data centers don’t need recovery within minutes, instead they need recovery with a couple hours. Organizations need to balance the cost of real-time DR with the likelihood of an actual full-scale disaster. If the entire data center is lost, a couple hours of recovery is a very good compromise.
Disaster Recovery, within hours of declaration, is certainly within reach technologically for most organizations. There are storage systems that can replicate to another location and there are even a few vendors, which can automate the recovery order. The challenge is the costs associated with having a site sit idle until disaster strikes. Disasters, to the point that a data center is destroyed, are a once in a lifetime experience for most organizations, even declaring a disaster and starting the process is rare. The ramifications of not being prepared if disaster does strike are too severe to ignore, so organizations are constantly struggling with the balance.
Enter the cloud and DRaaS. DRaaS eliminates the second site and the cost of paying for idle equipment. It spins up compute power on-demand. It sounds great on paper but execution is a problem. IT still needs to make sure that it does preparation of the cloud-based DR site correctly, and it needs a Runbook like functionality to make sure that applications come on-line in the right order. In the cloud though, IT needs to factor in a transformation process to convert on-premises applications to run under cloud-hypervisors. It is possible to automate the transformation process, but it does take hours per virtual machine. Once there, the customer then has to learn a completely different administrative model just for the DR event — it is no longer vSphere at any level. And many of the early providers who offer conversion for failover may not really have a good answer to failback, once the issue is resolved.
Introducing Datrium DRaaS
Datrium is a provider of on-premises, disaggregated hyperconverged Infrastructure (D-HCI) with its DVX solution. The disaggregation separates the scaling of storage capacity from CPU and storage performance, making HCI more efficient as we discussed in our blog “What is the Hyperconverged Tax?” As an on-premises primary storage solution, Datrium has all the features IT professionals expect, including site-to-site replication. Recently Datrium added ControlShift, part of the platform, which takes care of much of the secondary site automation that recovering from a disaster requires. Our briefing note “Multi-Cloud versus the Five Requirements of Data” details the ControlShift orchestration solution.
Datrium has now taken the next step, eliminating the need for a second site, by leveraging the cloud and creating an end-to-end DRaaS solution. The solution replicates data from the on-premises DVX to a cloud instance of DVX, which stores data on cost-effective Amazon S3 storage. If the customer experiences a disaster or if they want to perform a test, the service copies S3 data from the cloud instance of DVX y to VMware Cloud (VMC) on AWS. Using VMC allows Datrium to bypass the concern over transformation issues and the customer is now running in an environment that is exactly like what they were running on-premises before the disaster.
The Datrium DRaaS solution also provides an element missing from most DRaaS solutions: failback. The organization may not want to stay in the cloud forever, so getting data back on-premises is critical. The truth is that a complete destruction of the original data center, does not immediately follow most disaster declarations; most declarations are “just-in-case” situations. The impact is that most of the time data has changed at the DR site, which in the case of DraaS, is in the cloud, but the primary data center has all of its equipment intact.
The problem is most DRaaS solution’s failback “solution” is a complete recovery of the primary data center, even though most of the data at the data center is intact and only a few days old. Datrium provides an intelligent copy back process and only needs to send back the data that has changed since the failover event occurred, saving the customer time and money by reducing egress fees.
Datrium uniquely solves many of the challenges facing HCI, and now with this latest release is preparing to do the same thing for DRaaS. Its solution enables customers to eliminate the common problems with DRaaS, like the cost of storage and transformation times and instead focus on the task at hand, rapid recovery from disaster. Datrium’s ControlShift provides the, often missing, orchestration that IT needs for successful disaster recovery It is a complete picture that at a minimum, rivals what any other primary storage vendor provides and in most cases is far ahead of its competitors.