Traditional backup processes were developed for local backup systems, not the cloud, and so were not designed to minimize the time required to complete backup jobs over a WAN. Most cloud backup solutions leverage this same technology, storing data locally and then transferring it to the cloud as a separate step. Due to the cost of available bandwidth and the inherent latency of traditional backup processes, most cloud backup solutions have included an on-site storage device or appliance.
The increased availability of cost-effective, 1Gbps+ Internet connections are making direct-to-cloud backup more practical, but the inefficiencies built into legacy backup architectures still stand in the way. Purpose-built, direct-to-cloud backup solutions can be the answer. By re-architecting the traditional backup process, they can reduce latency and eliminate the need for a local copy or local appliance, prompting the question, “Is hybrid cloud backup dead?”
Hybrid Cloud Backup – How we got here
Historically, the backup process pulled data from host servers and sent it directly to tape drives, and later to disk, but was controlled by a central backup server. Disk-to-disk backup made the multi-streaming of jobs simpler as data was sent to shared disk storage or backup appliances and consolidated. Deduplication on the target storage system was added as another step to reduce the amount of disk capacity consumed.
Making the cloud an off-site repository was a logical progression, but since data was already stored locally it meant adding another step to transfer backups over the WAN. Also, this two-step process decoupled servers from that WAN transfer to accommodate the slower bandwidth connections that were common. But instead of modifying an architecture originally designed for local backups, direct-to-cloud backup solutions start over and design the system from the ground up to send data to the cloud.
Direct-to-cloud backup solutions are software programs that run directly on the application servers they are protecting, handling the change detection (figuring out what data to back up), data reduction and transfer to the cloud repository without having to go through an appliance. This adds a great deal of parallelism to the backup streams, meaning the system can be more scalable and more efficient than traditional backup, which consolidates these processes into a single dedicated backup server or appliance, often creating a bottleneck.
Direct-to-cloud backup runs these steps on each protected server in parallel, distributing the backup workload across the data center. This eliminates network traffic to the local appliance and shortens the time required to get backups off-site. It’s also more reliable as there are no single points of failure such as the hybrid appliance, which can affect data protection for the entire organization if it goes down.
Making Direct-to-Cloud Backup Work
Hybrid cloud backup is an evolution of the traditional backup process that adds a cloud connection at the end. In contrast, direct-to-cloud solutions were designed from the beginning to skip the local backup step, essentially streaming data from the application server to the cloud. To make this work they must be able to copy data from the server and send it over the WAN as fast as the disk drives will provide it, and Internet bandwidth will allow. Since affordable, 1Gbps+ Internet bandwidth is now readily available, a redesign of traditional backups is required to accelerate the process and remove bottlenecks. The way to speed up backups is to make the overall process more efficient and to execute the component steps in a massively parallel fashion.
Faster Change Detection
Efficiency starts with change detection, the pre-processing of data on the host servers to determine which objects need to be backed up. Traditional backup software looks at which files have been modified or which blocks have changed, but this requires a lot of CPU cycles that slows the process down.
Technologies are now available to abstract data objects that are candidates for backup into digital signatures, and compare those with a local cache of signatures representing data that’s already in the backup repository. This comparison process is faster than traditional change detection methods and can actually keep up with the streaming rate of disk drives. This means that changed data can be identified for backup as fast as the disk drive can send that data out of the server, essentially eliminating change detection as a separate step in the backup process.
Faster WAN Transfers
Data transfer to the cloud is an obvious place to improve efficiency and increase backup speed. WAN optimization is the general term for processes that make the best use of available bandwidth. These include “packet shaping” and other data conditioning technologies.
Direct-to-cloud backup runs on the application server and can leverage an understanding of its applications and the data objects they create to optimize WAN transfer. An example of this is dynamic TCP/IP window sizing that adapts to these different data types and better fills each packet.
Another way to speed Internet file transfer is to make that process more robust. During transmission a file is parsed into packets, which are sent to the destination server (the cloud backup location) and reassembled into the original file. Retransmission occurs when packets are lost en route, but also when packets are received out of order, something that’s fairly common given the variability of routing on the Internet. Direct-to-cloud backup systems can help manage this process by sharing a “map” of sub-file components that enables the destination server to more easily reassemble out of order packets into complete files and reduce retransmissions.
Parallel vs. Serial
Hybrid appliances serialize change detection, deduplication and WAN optimization by running these processes on the appliance, often using lower-performing server hardware. Direct-to-cloud backup runs the entire backup process on the application servers, parallelizing the component steps and leveraging the abundant CPU power that’s typically available on each server. This removes the potential for a bottleneck at the backup appliance and can greatly reduce overall backup windows. But there’s another benefit to disaggregating the backup process as well.
Many companies don’t consider a backup fully complete until the data is securely stored off-site. For years this meant waiting until the truck delivered backup tapes to the vault. With hybrid systems it means waiting until the latest backup has been successfully sent to the cloud. Direct-to-cloud solutions allow each server to conduct its own backup jobs at the same time (within available bandwidth), speeding up completion of the overall process. Server-side backup control also allows critical applications to be updated in the cloud more frequently, a process that’s more complicated when backups are consolidated in a hybrid appliance.
Hybrid cloud backup appliances have become the standard technology for cloud backup, largely because cloud backup has been an outgrowth of traditional, on-site backup technologies. Direct-to-cloud backup solutions were designed from the start to send data to the cloud and skip this on-site step.
By leveraging improvements in data change detection and WAN transfer, plus the abundance of CPU power, server-side direct-to-cloud solutions are able to stream backups to the cloud as fast as most servers can output the data. Combined with affordable, high-bandwidth Internet connections, this results in faster, more reliable, lower cost server backups and may eventually spell the end for hybrid cloud technologies in corporate application server backup.
Sponsored by Zetta.net
Leveraging technologies to increase parallelism and reduce latency in the backup process Zetta’s DataProtect offers direct-to-cloud backups that are faster than most hybrid backup solutions, up to 5TB backed up in 24 hours. With their SSAE-16-audited data centers Zetta.net provides enterprise-grade backup and DR-as-a-Service for enterprises, businesses and MSPs, with 99.99996% reliability and 100% recoverability.