Initially most VMware projects start off by leveraging the backup applications that were in place prior to the virtualization effort. Then, IT Planners realize that they need something more and they turn to VM-specific backup applications. But then they are often faced with another decision to make, using a backup product that requires agents or one that is agent-less.
What is An Agent?
The first point to understand is what exactly an agent is and what purpose it serves in the backup process. Agents were first used in the early days of backup and were implemented to ‘prepackage’ backup streams. Essentially, they encapsulated the thousands of small files that made up a backup job into a single object or set of objects so they could be transferred more efficiently across a network. Agents also converted that data into a format that the backup application expected.
Eventually, the role of agents grew to include adding specific intelligence to properly protect applications such as Oracle, Exchange or SQL. This application level awareness also enabled the user to perform granular recoveries of specific items within these files.
Agents also come with some negatives. First, they require resources from the server they’re running on so all of this pre-packaging can be done. The file system must be scanned, often in a time consuming, item-by-item manner, to find files that need to be protected. As the file counts on servers grew, this process, called a “file system walk”, became a significant part of the backup process, so much so that this one step began to impact backup windows. In the extreme case it took longer to walk the file system than it did to actually transfer the data.
In addition, agents also had to be individually purchased, deployed and maintained, all of which raised the actual purchase price of the backup solution and its operational complexity.
Despite these negatives, agent-based backup was the dominant form of data protection in the pre-virtualized, physical data center. It made sense then, especially for legacy backup applications, that the use of agents would continue into the virtualized infrastructure. Today the question is to determine if an ‘agented’ approach to backup makes sense in this modern virtualized infrastructure.
What Types of Agent Options Are There?
In the virtual environment there tends to be three options for protecting the data within a virtual machine; a data mover agent option, a recovery assisted agent option and an agent-less option.
Option 1: Data Mover Agent
The traditional data mover agent bears an almost exact resemblance to the legacy agents described above. It’s installed inside the virtual machine and performs functions like scanning for files in need of protecting, packaging those files into a single object and transferring them to the backup server.
The data mover agent also shares all the challenges of legacy agented backup. It resides in the virtual machine and consumes resources as it does its tasks. But the problem is made worse in the virtual environment since dozens of these agents could be running on a single host, all demanding portions of a limited pool of resources.
Another problem with this agented approach is that it’s really unnecessary in virtualized environments. The hypervisor can already determine, at a block level, which files have changed and which have not. Also, the virtual image is already encapsulated; there is no need for the pre-packaging step described above.
Finally since the data mover agent option is file based, it makes full VM recovery burdensome. After the virtual machine instance is created the operating system needs to be manually installed with the appropriate drivers and configuration settings. Then each file needs to be restored, file-by-file. This is inefficient in the virtual environment since the virtual machine image is already encapsulated and recovery should be as simple as copying one file (the VM image) back into place.
Option 2: The Recovery Agent
The second type of agent is the recovery agent. This agent does not package and transfer the data like the data mover agent does and is typically not active during backup. Its purpose is to provide application aware backup and granular restores of data. This agent essentially sleeps, waiting to be awakened by an application-specific backup or recovery request. When such an event occurs it consumes resources to “crack” open the backup data and extract the individual components that need to be recovered.
This agent is also the source of some confusion for customers when selecting a backup product. Some vendors incorrectly call this “agent-less” backup, described below. While this agent typically has very minimal impact on resource consumption, there is still a piece of software installed and it does add to the operational overhead of the environment.
A backup administrator has to install the agents, make sure each one works with that VM’s workload of applications and data. The administrator must also maintain each agent and assure that as the application or operating system within the VM is upgraded that agent is upgraded in lock step. For the backup admin this can mean breaking out the spreadsheet and developing a complicated grid of backup agent versions, application versions and operating system versions.
Option 3: Agent-less Protection
The third option is one employed by companies like Veeam. This option leverages the capabilities of a modern hypervisor-centric data center instead of dragging old technology into it. This is a true agent-less approach that is built for virtualization; it requires no software be installed on the host or the virtual machine.
The backup software interfaces with the hypervisor API or SAN infrastructure directly to access the VM information that needs to be protected. During backup, Veeam transfers only changed blocks using “change block tracking” technology. This process opens up new possibilities in the frequency at which data protection can occur. For example backups can now run every 15 minutes without impacting virtual machine performance.
Being agent-less does not mean less backup integrity or the loss of granular recoveries. In fact, leveraging native virtualization capabilities, agent-less products should be able to provide more powerful and easy-to-use data protection features. Depending on the vendor these features tend to be available at a lower price point. An example of these capabilities is the ability to start a server directly from the backup file for instant recovery, or provide VM replication at no additional license cost.
For granular file and application items recovery, agent-less solutions leverage several approaches. First they have the ability to read the backup file and guest OS file structure of the VM for file-level recovery. Second they have the ability to start a VM instantly from the backup file in an environment, isolated from production and recover data from any type of application, including proprietary homegrown apps. And all this work is done on the backup server not the production server.
In the end the backup administrator has a job to do, make sure that the virtual environment is well protected and is able to be rapidly recovered. While the type of agent, or not, used by the backup application can seem like “just details” the backup option chosen can have a dramatic impact on not only the cost and operation of the backup process but also on its value to the organization.
Capabilities like those offered by agent-less solutions provide more than just backup and recovery, they provide a complete data protection solution that reduces the amount of time that an application will be down and the amount of data that will be lost. Finally they provide capabilities like VM testing and disaster recovery that were once the domain only of expensive HA solutions. As a result for the highly virtualized environment agent-less solutions deserve strong consideration.
Veeam is a client of Storage Switzerland