In a recent blog, we discussed how scale-up architectures break backup, forcing customers into expensive and disruptive forklift upgrades. Scale-out architectures are the logical solution to the scale-up challenge. But, scale-out architectures create problems of their own. IT planners need to look for the right type of scale-out architecture.
Backup Software Doesn’t Scale-Out
One of the major challenges facing scale-out backup architectures is that for the most part, only the backup storage hardware scales-out. Scaling out backup software is more vital than ever. First, there is the obvious impact of backing up more files, applications, and more total capacity as production storage volumes continue to grow at an unabated pace.
Now there is the reality that organizations expect backup software to evolve and do more than just backup. Organizations want to leverage backup data and storage to create and host on-demand virtual volumes for rapid recoveries. Organizations are also expecting to be able to use those virtual volumes to deliver capabilities like reporting, testing and analytics, often called copy data management. Additionally, many backup solutions now deliver functions like deduplication, compression, and replication that were previously only available in data protection hardware.
Organizations are smart to look to use the backup solution and the data it contains as more than just an insurance policy. Leveraging that data for more use cases makes cost justification of the investment easier while lowering investments in other areas of the organizations.
The problem is that most enterprise software solutions are not able to scale to meet these new, more widespread demands. They can barely keep up with data growth, let alone deliver new features at acceptable performance levels. This is primarily because most software solutions are scale-up. Some vendors claim to offer a scale-out architecture, but in reality, they are a two-tier architecture with a master server and media servers. The master server typically can’t distribute the processing of jobs, data, or the management of the backup metadata.
Scale-Out Backup Storage Doesn’t Scale Right
Scale-out storage has been around for a while, but it generally has not functioned well as capacity for the backup process. The first challenge is scale-out storage solutions typically bottleneck on ingest. One or two nodes are responsible for initially receiving data, capturing metadata and then distributing the actual data across nodes in the cluster. Having one or two control nodes works fine for production data since it doesn’t have the requirement for rapidly ingesting data like data protection storage does. The one or two node system is critical if the hardware is going to provide cluster-wide deduplication.
An alternative is to have a loosely coupled cluster of nodes, each of which can receive data independently, solving the bottleneck issue. The software still provides the single management point. The challenge with this approach is that it typically requires manually sending backup jobs to a specific node and then re-directing it to another node if there is a change in policy. Additionally, these loosely coupled clusters provide media protection (RAID) on a per node basis, reducing capacity efficiency. Finally, they can’t provide cluster-wide deduplication, further reducing capacity efficiency. This means that if two servers with similar data are going to the same cluster but directed at different nodes, the two nodes will have data redundancies between them.
Fixing Scale-out Protection
A solution is for IT planners to look for backup solutions with a complete scale-out strategy, both software and hardware, essentially the backup software provides its own hyperconverged scale-out file system. This hyperconverged approach enables the backup software to execute functions across multiple nodes while also distributing data across those nodes. This reduces costs because storage capacity is internal to the servers and data efficiency is high since media failure protection is cluster-wide. The backup software, which now has plenty of CPU power to take on the responsibility, now handles deduplication prior to writing the data to protection storage. As a result, the hyperconverged data protection solution can deliver global deduplication while still having the horsepower to deliver rapid recovery and other copy data services.
Conclusion
Both legacy scale-up architectures and scale-out architectures face big challenges from the modern enterprise data center. Legacy designs were created to primarily handle the capacity problem, but today backup is being asked to do so much more than just store copies of data. It is time for enterprises to rethink how data protection software and hardware integrate and work together to deliver a more comprehensive and truly scalable solution.



