Like death and taxes, growing backup windows seems to be an inevitable fact of life. To stave off the backup window, IT architects have resorted to a myriad of tactics to stay one step ahead. The most popular tactic in recent years, has been utilizing a disk based backup solution that leverages deduplication for storage efficiency. While these solutions can help improve cost per GB of disk backup, they can also introduce latency and extend backup processes.
Selecting a disk-based backup solution should be more than going with the “default” vendor. The architecture of the disk backup system can directly impact the backup window. Understanding and selecting the right disk backup architecture is critical to designing a backup process that meets the backup window both today but also, more importantly, in the future.
Disk and Tape Conundrum
Many organizations have utilized disk to disk to tape backup solutions (D2D2T) to decrease backup windows and enable fast copies to tape media. Rotational disk drives can handle multiple backup steams from backup applications for rapid backups and can efficiently stream tape drives to allow for fast copies to tape.
The first challenge with D2D2T infrastructure, however, is that as backup data grows, the front-end disk storage repository also has to increase to accommodate this growth. For example, a 10TB environment needs 20TB’s of disk storage to manage one week’s worth of backup data (one full and four daily incremental copies). To store an additional week’s worth of data, this figure must double to 40TB’s, and so on. As a result, most organizations can typically only afford to store between 1-2 weeks worth of backups on disk.
The second challenge is that D2D2T infrastructure is still very tape centric. In order to satisfy data off-siting requirements for DR purposes, data has to be exported to tape nightly and then transported via courier to an alternate location. Replicating uncompressed data over WAN links is simply not an option due to all the bandwidth that would be required to move this information.
Dedupe Driven Delays
The challenges of D2D2T have sent many IT Planners in search of an alternative. Disk based data deduplication provides a solution to these issues since multiple weeks worth of backup data can now be efficiently stored on disk as data is stored in a highly compressed or deduplicated format. From week to week there is about a 2% data change rate. Deduplication only stores or replicates the changed data and therefore is both disk storage and WAN bandwidth efficient. Data can be efficiently stored to reduce the amount of disk required and backup data can be efficiently replicated offsite over WAN links, to electronically vault data. This enables many organizations to dramatically reduce their reliance on tape.
It is important to note, however, that most organizations don’t replace tape in its entirety. They still will make tape based copies on a monthly or quarterly basis for an extra layer of protection and for additional data retention. This is also true in the mid-tier to small enterprise data centers. Many of these organizations can’t justify the costs associated with implementing electronic data vaulting; like bandwidth, secondary disk backup appliances and secondary data center infrastructure. For these organizations, tape is still the primary method of moving data off-site and the ability to create a rapid tape copy is critical to the protection process.
While data deduplication provides significant benefits, it can also have some unintended consequences. Data duplication is a computational and memory intensive process and as a result, it is possible for backup and recovery windows to actually increase when data deduplication is performed. In other words, the added latency of deduplicating data in CPU and memory before it is safely landed on disk storage can elongate backup windows. Moreover, as we will discuss in our next article, “How Data Deduplication Impacts Recovery”, deduplication can also add latency to the recovery window and impact application recovery time objectives (RTOs).
Scaled-Up Latency
To compensate for the added latency of deduplication, some disk backup appliances over-provision CPU and memory to increase the speed of deduplication processes. But this only serves to inflate the cost of these offerings and reduce their return on investment (ROI) when compared to comparable disk and tape backup systems. This is especially true in “scale-up” backup appliance disk architectures that require an up-front investment in the maximum CPU and memory resources supported by the array.
In scale-up storage architectures, all the CPU, memory and disk are contained within a single chassis or design frame. In scale-up designs, storage I/O performance is shaped like a bell curve. Performance gradually increases as disk drives are added to the array but then eventually performance plateaus and then decreases as the array approaches maximum disk capacity. More critically, performance does not increase as the data and resulting deduplication load increases and therefore the backup window continues to expand. The only way to then reduce the backup window is to replace the front-end controller with a larger and faster system.
This need to periodically refresh scale-up backup appliances with newer models places an increased burden on IT operational staff to plan for and manage upgrades. In addition, there is often a higher total cost of ownership (TCO) with investing in scale-up architecture designs, especially when compared to alternatives, like scale-out systems.
Scaled-Out Performance
Scale-out disk systems are a completely different architectural paradigm from traditional scale-up disk platforms. Scale-out disk systems consist of independent appliance nodes that contain their own discrete containers of storage, CPU, bandwidth and memory. The appliance nodes are then automatically integrated together via software to form a GRID architecture.
With scale-out systems there is no need to over-provision CPU and memory resources on the initial deployment. Instead, the system can start out small and be gradually scaled out over time to meet additional backup capacity requirements as the environment grows. One of the major benefits to this approach is that it dramatically simplifies the upgrade process. In fact, older appliances typically can be intermixed with newer appliances so there is never a requirement to perform a forklift upgrade. This helps organizations maintain investment protection in their backup storage assets.
A second benefit to scale-out backup systems is that as the data and resulting deduplication load grows, the appropriate resources (processor, memory and bandwidth) are also added. This keeps the backup window at a fixed length. This is a key advantage for organizations that struggle with containing their backup windows.
With scale-out systems like those from ExaGrid, full server appliances can be added into a scalable GRID. All appliances onsite and offsite are viewed through a single user interface. In addition, the data across the appliances automatically load balances across all appliance nodes. There are various sized appliances allowing you to add the right amount of compute and capacity, as you need it.
Hybrid Backup Flexibility
Additionally, to further enhance backup and recovery speeds, some scale-out backup architectures incorporate a separate disk “landing zone” with native disk that is unencumbered by the latency of data deduplication processes. Backups are sent directly to disk avoiding the compute intensive process of deduplication during the backup process. This helps to speed up backup times. These systems maintain 7 days of backup data on native disk storage and then migrate data older than 7 days to a deduplication partition for longer-term data retention.
Since backup jobs do not have to go through the data dehydration process, this allows for backup copies to land very quickly on disk. And since the most recent backups are always in their complete non-deduplicated native format in the landing zone area, offsite tape copies can be quickly made without having to go through a lengthy re-hydration process. This is particularly important for organizations that plan to continue to use tape as their primary means for off-siting critical business data. For those organizations that desire to eliminate tape at the disaster recovery site, deduplicated data can be replicated to a secondary offsite system. Since only deduplicated data traverses the WAN, it is extremely bandwidth efficient.
A third benefit of maintaining a disk landing zone is the ability to take advantage of new instant recovery features available from virtual backup application providers like Veeam and Dell VRanger. These backup suppliers now offer the capability of booting a virtual machine (VM) directly off a backup image.
In order for this feature to work, however, the data must first be in its full native format. Backup systems that only maintain backup images in a compressed or deduplicated format must first reconstitute the data into its native format before it can be used by the VM as a mount point. This would delay the recovery process in some instances by hours whereas in the case of a disk-landing zone, it would be immediately available for recovery.
Conclusion
The widespread adoption of scale-out storage architectures in primary storage environments is clear evidence that a new paradigm is needed to help organizations scale storage capacity and performance more linearly. Likewise, backup storage infrastructure needs to evolve to meet these same challenges. Data compute intensive processes like data deduplication arguably make the need for GRID based scale-out designs even more critical for effectively scaling backup infrastructure.
The challenges of effectively and efficiently managing unrelenting data growth and extended backup windows may require a break from traditional backup storage architecture design. GRID-based Scale-Out backup architectures, like those from ExaGrid, utilize a combination of native disk to support rapid backup and recoveries along with a separate deduplication storage zone. This helps organizations flexibly grow their backup infrastructure without resorting to time consuming and costly forklift upgrades.
ExaGrid is a client of Storage Switzerland

[…] it comes down to a time and resource crunch. As we discussed in our previous article “How Backup Disk Architecture Impacts the Backup Window”, in an effort to overcome this latency problem, some backup appliances have thrown hardware at […]
[…] noted in the recent Storage Switzerland article by Colm Keegan growing backup windows are the biggest challenge faced by IT professionals and the […]