Thanks to an easy-to-measure return on investment (ROI) most data centers’ virtual server environments are growing rapidly. Many organizations have implemented a “virtualize first” policy, where all new servers are virtualized. In addition, legacy servers are being migrated to virtual instances as quickly as possible. While virtual environments have shown an incredible ability to scale and generate positive ROI, the data protection process has not. This data protection scaling issue threatens to slow virtualization adoption as well as reduce its ROI.
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify, but not abandon, current features. In order to keep their virtual environments scaling and generating ROI, IT planners need to take a fresh look at what they should expect from data protection solutions so that the data protection process can scale in lock-step to virtualization.
Virtual Data Protection Needs to be Flexible
The first requirement of a data protection solution designed to scale with the virtual environment is that it be flexible. Virtualization is unique in the world of IT projects in that organizations large and small are implementing it. Each of these organizations has different needs and skill sets which must be accommodated.
Organizations without data-protection focused staffing are going to increasingly look to cloud based solutions to help compensate for some of their administrative shortfalls. While smaller in size the applications that these organizations are running in their virtual environments are just as critical to them as they are to large enterprises with dedicated data protection staffs. They need complete protection and restoration feature sets but delivered in a more automated and outsourced fashion.
Enterprises on the other hand will have the staffing to manage and support their own data protection operations. Because of sheer size they will need the capability to fine-tune their data protection processes to the specific needs of the business.
Because of these two valid yet contrasting realities vendors will need to provide the market as a whole with options for data protection. However, most vendors are trying to shoehorn an enterprise application into a small business use case or “grow” an SMB solution into an enterprise product. In many cases the resulting products are found lacking. These vendors could do greater service by providing entirely different solutions for these entirely different markets. For example HP Autonomy has LiveVault for organizations that need a more out-sourced, fully automated solution and HP Data Protector for enterprises that need customization and tuning.
Virtual Data Protection Needs to be Dynamic
Regardless of the size of the organization, the data protection process does face a common challenge; dealing with the dynamic nature of the virtual environment. Prior to virtualization, when a new server was installed the backup administrator was aware of it. Even if they were not told they at least saw the new machine being racked, a process that typically took days to complete. This made it easier for the backup administrator to learn what the new application was and how it needed to be protected.
In the virtual environment these admins have no such luxury. A new server and application can be deployed without any physical change to the environment and that virtualized server instance is often creating new data long before the backup administrator finds out that it’s been implemented. This gap between the time a new VM is created and the time it’s fully integrated into the backup process can result in data exposure and subsequent data loss.
In fact in a VMware environment administrator can easily create hundreds of VMs in minutes, literally doubling or tripling their data center workload. This concept is called “VM sprawl” and most organizations have recognized this as a significant management issue. The problem also exacerbates the challenge of data protection, as the backup administrator now has more to protect in volume, in less time, with fewer resources.
The backup environment needs to have the ability to “sense” a new virtual machine being created and instantly apply a backup policy so data protection can occur. HP, for example, has added a one-touch backup policy so that as new VMs are added to the environment they are automatically protected. Later, as the backup administrator has time, these backups can be fined tuned and/or sent to alternate backup destinations.
Virtual Data Protection Needs to be Efficient
As the number of virtual machines per physical server increases, a term called VM density, the problems of backing those VMs up using traditional file based methods becomes more severe. Each additional VM adds more data that needs to be processed by a shared CPU and transmitted across a shared network adapter. Backups need to be performed at a much more granular level to offload this processing and transmission. To assist in this process hypervisor vendors like VMware have developed APIs for off-host backup at very granular levels.
Changed block tracking (CBT) is one such technology that allows for backup of just the blocks of data (sub-files) that have changed since the last backup. This capability dramatically reduces the amount of data that needs to be transferred to the backup device.
Prior to a changed block backup occurring a snapshot of the data is taken. When changed blocks of data arrive on the backup device the original blocks are secured in that snapshot. This gives a point-in-time recovery capability to changed block backups, but at the expense of capacity.
To scale backup to meet the virtual environment means that the technology cannot stop there. It should also help control the size of the backup storage area, otherwise the cost of capacity for the backups could well exceed the cost of primary storage.
To curtail the exponential growth of backup storage, landed backup data should be deduplicated in addition to CBT. While changed block tracking eliminates redundancy within each VM image, deduplication eliminates redundancy across VMs. And a VMware environment is replete with redundant information.
In addition to efficient backups, restores also need to be made efficient. If the backup technology chosen is cloud based, it should have a hybrid component that allows for rapid local restores without Internet latency.
Restores should also be granular. A VMware VM is essentially a server instance with all its files encapsulated into a single file so it’s critical that the backup administrator have the ability to “crack-open” these images when a single item needs to be restored. Again, this is an efficiency issue, being able to “browse” the VM and select the components needed for restoration saves time and network resources.
Enterprise backup applications should also leverage a technique similar to changed block tracking to only restore the blocks of files that have changed since the last backup or the required point in time. Once again this minimizes the amount of downtime experienced.
Data Protection Needs to be Integrated
In order for the data protection process to scale at the pace of the virtualized environment it also needs to be integrated with the various components of that infrastructure. Hypervisor integration as described above can enable features like changed block tracking and automatic VM detection.
Beyond the hypervisor the data protection application should integrate with the applications running within those virtual machines. Even in a virtual environment, ensuring consistent state backups is critical to data integrity. Fast, efficient backups are important but if the data can’t be counted on during a recovery, the process is useless. Application integration also allows for more frequent backups to be taken of mission critical applications narrowing the recovery point objective (RPO).
Finally, in order to scale, data protection applications need to integrate with the storage systems that the virtual environment counts on for hosting the VM images themselves. These storage systems are typically advanced shared devices with features like snapshots that the data protection application can take advantage of and provide management into.
Ideally the data protection application would be able to interface with the storage hardware to trigger a snapshot while leveraging its own application modules described above for a clean consistent snapshot. The backup process would then execute the backup of these application data from the snapshot copy for an entirely off-host data protection strategy. In essence “backups” can be taken near continuously without impacting application performance.
For the virtual environment data protection has been viewed as a hindrance, a reason not to increase virtual machine density and a reason not to more rapidly convert to a fully virtualized environment. As applications like HP’s Data Protector and LiveVault evolve to provide greater flexibility, the ability to protect data across virtual and physical machines within one management interface, seamless extension from on-premises to the cloud, and deeper integration with various hypervisors, that view can change. Data Protection can become an accelerant to virtualization adoption by providing safety every step of the way.
HP is a client of Storage Switzerland