Cloud backup has the potential to ease backup pain while also lowering costs significantly. Most cloud backup solutions tend to focus on lowering costs, but while they claim to simplify operations, many cloud solutions are very similar to their on-premises counterparts and don’t address the actual problem associated with data protection.
What’s the Backup Problem?
Scalability is a chief concern for backup architectures. Most vendors and IT professionals focus on creating architectures that can scale capacity to near infinite levels. Capacity is indeed a legitimate concern, but many backup solutions promise capacities that far exceed most organization’s need.
Scaling the architecture of the backup solution is the real challenge. Backup solutions need to track thousands of endpoints, hundreds of virtual machines and dozens of bare metal servers. Within each is a subset of data that can range from thousands of files to potentially billions of files. Almost every distributed organization with multiple data centers and offices can present this challenging profile. The backup solution needs to track not only each file but also each version of each file for a period specified by the user. All the data about all these systems and the files they contain needs to be managed by the backup solution.
Scaling and managing backup metadata is paramount to scaling the overall backup environment. Also, organizations need the ability to extract information from the rich history that the backup solution is capturing. Tracking not only information about files but also system state can add another layer of value for the organization. Finally securing and responding to compliance demands on this data puts another layer of importance to a scalable metadata capability.
Working Around the Backup Scaling Problem
Most backup solutions attempt to solve the scalability issue by creating a parent-child relationship between primary backup servers and secondary servers, often called media servers. The parent-child relationship enables the backup architecture to scale backup bandwidth but does little to help with metadata scalability. The primary backup server still holds all the metadata information and as that information grows, the lack of scalable performance on the single primary server impacts performance. The organization must either shrink the size of the metadata retained, losing rapid access to old data, or it must continuously upgrade the primary backup server.
While most backup solutions have a solution for scaling out storage capacity most can’t scale out compute and as a result, they can’t deliver the processing power required to manage large metadata instances. The lack of scale-out processing of metadata is not solved merely by putting the backup software and storage in the cloud. If the backup software can’t leverage cloud computing in a scale-out fashion then the cloud provides no additional benefit than an on-premises solution.
Solving the Backup Scaling Problem
Solving the backup metadata scaling issue requires that the backup vendor scale-out the computational power available to manage backup metadata. A backup solution that can scale-out computing power can handle vastly larger metadata stores, which means it can also manage significantly more data objects than its competing backup solutions. The cloud becomes a logical host for a scalable backup solution. The solution has ready access to all the cloud computing power it needs and can automatically provision more computing power as the demand requires. It may also only provision the extra computing power for spot use cases like metadata verification or re-indexing.
The key for IT planners is to look for backup solutions that can scale-out for their compute requirements as well as their storage needs. Then IT planners can see the value of hosting such a solution in the cloud instead of on-premises.
An area where a scalable backup architecture is critical is when organizations try to protect a distributed enterprise with multiple data centers, dozens of remote offices and hundreds, if not thousands of user devices. Protecting the distributed data center is the subject of our on demand webinar “All in the Cloud, Data Protection Up, Costs Down.”