A real disaster is the ultimate test of potentially years worth of planning. Disasters are unnerving because the IT professionals executing the recovery processes have the eyes of the entire organization and its customers on them. IT, which normally operates behind the scenes, is now front in center and the pressure to rapidly recover the organization to an operational state is immense. IT needs focus, not distraction. The problem is most disaster recovery solutions add to the distraction, not lower it.
The modern data center is no longer a one app – one server – one LUN design. Many virtual servers now share a single server and a single LUN or Volume. Most storage systems, though, operate at a volume or LUN level. The lack of granularity creates a multitude of problems, making it difficult to isolate VM for performance guarantees or problem resolution. But the lack of granularity is especially problematic in terms of data protection and disaster recovery.
All VMs are not equally valuable to the organization. Some are mission critical, others are important and some are nice to have. Logically, you apply different levels of protections per importance. Pragmatically, the only applications the organization needs in a ready-state at the DR site are the first wave of apps they must have to return the organization to operation, typically these apps are labeled as mission critical. The problem is that without VM granularity data protection processes like snapshots, backup jobs and replication have to treat all the VMs on the volume the same. It can’t differentiate between mission critical, business important, nice to have and not needed at all. The storage system just “sees” a bunch of ones and zeros.
Without VM granularity IT has to apply a single snapshot schedule to those VMs. Then the backup process has to backup all of those VMs. And, yes, while changed block tracking thins the amount of data, all the VMs have to be analyzed for change, wasting precious backup window time. Finally, all the VMs need to be replicated to the remote DR site and, again, a storage system without VM granularity has to replicate everything on the volume to the DR site. This forces IT to filter through the VMs to recover the right VMs.
A second problem with limited VM visibility is understanding VM interrelationships. For example many applications will need a DNS and a Directory server to be restored and running prior to starting. If the data for DNS is on Volume B and the data for the directory is on Volume C but the organization is only replicating Volume A (where the application is) then the DR attempt may fail or at least be slowed while DNS and Directory servers are restored from backup.
A third problem is cost. For most organizations a full scale disaster will be a rare experience. Obviously, the impact of that rare experience occurring is grave enough that it does have to be planned for, but its rarity also means that being wise instead of speculative or naïve with DR site expenditures also makes sense. This is especially true when considering that the first wave of recovered applications is relatively small, which makes it even more expensive. To make the problem worse, often the cases being dealt with are human errors batching deleting some critical VMs. The alterative to handle these “not-so-rare” scenarios in a cost-effective way becomes increasingly important and visible within an organization.
Empowered by a VM aware storage system, organizations can design the DR site to be able to handle only that first wave. All subsequent recoveries, for business important or nice to have applications, are recovered as additional equipment is ordered in. The value of the storage system understanding the VMs that it stores is that only these mission critical VMs need to be replicated and be stored in a ready-state at the DR site. The rest can be protected by the backup application on cost effective backup storage or even tape. As a result VM granularity leads to the purchase of a smaller and faster responding storage system and less server hardware at the DR site.
Successful recovery from a disaster requires a scalpel not a saw. Being able to surgically identify just the VMs needed to return an organization to operation is vital to focusing IT responders on the task at hand. At the same time this precision keeps costs of the DR site under control. Most mission critical workloads are now virtualized and intermingled with less important applications and data. Storage systems with VM understanding allow for easier identification and isolation of the most important VMs during a disaster and are critical to successful and affordable DR.
Sponsored by Tintri