Appliances have arguably become the de facto product format for new technologies in IT for a number of reasons. They’re complete systems that typically don’t require other components and their ‘plug and play’ implementation is simpler than the software and hardware integration projects they often replace.
Disk backup has been an ideal application for the appliance format since an appliance can consolidate several backup infrastructure components and simplification is something that’s universally appealing. Appliances can be an enabler for mid-market and smaller users that don’t have lots of expertise and a time saver for larger organizations that do, but have plenty of other projects to keep them busy.
But appliances can also have some disadvantages, like scalability, since many appliance designs don’t support adding capacity beyond the internal capacity of the initial system. So, when backup storage grows right along with the primary data sets it’s protecting, users of these fixed capacity systems are forced to buy additional appliances. This leads to a condition known as “backup appliance sprawl”, which can have more serious implications than simply taking up data center space. And, although most backup appliances leverage deduplication, it won’t resolve the scalability issue
Deduplication
Most disk backup appliances include deduplication, which uses a common index to facilitate the comparisons that are fundamental to this data reduction process. For hash-based or ‘in-line’ deduplication, this comparison is made in real time, which means the index is typically maintained in RAM. Due to costs, the amount of RAM available in these appliance architectures is limited. This puts a size limitation on the index which becomes the gating factor to how large these disk backup appliances can scale.
Each appliance has its own data block index and therefore must perform its dedupe process on its own data set, it can’t leverage the indices in other appliances. This impacts the overall dedupe ratio across all appliances, which for many of these systems is fundamental to their economic justification. But aside from lowering effective capacity, backup appliance sprawl has some other side effects.
Performance
One of the trade offs made by the appliance model is flexibility. Appliances come preconfigured in order to simplify implementation, typically for moderate workload scenarios. Processing power can’t usually be scaled when data sets grow. Given that the controller overhead of the inline deduplication process is proportional to the amount of data, performance of these appliances can suffer as they start to fill up.
Growth
When more appliances are added, backup jobs must be load-balanced in order to maximize capacity on each appliance and to manage the throughput drop as each appliance fills up. This leads to capacity planning across the infrastructure, a largely manual process, which can get complex and time consuming as the number of appliances increases.
Management
As with any hardware implementation, more appliances creates more management instances. This means more appliances to patch and maintain and more appliances to expand and upgrade. A growing collection of backup appliances can also mean more work to monitor and confirm that backups were completed.
Replication
Disk backup appliances offer a simple and effective way to get data offsite for DR purposes. Most require a second appliance be installed as a remote target and can accomplish the replication process on their own, or in conjunction with an enterprise backup application. As each additional backup appliance is put into the infrastructure to support the expanding primary environment, backup jobs get divided across multiple appliances. While most of these solutions support a ‘many to one’ architecture, allowing multiple primary site appliances to replicate to a single DR site unit, each replication job must still be managed independently.
Backup appliance sprawl is a problem that can creep into an IT environment, diminishing efficiency as each new box is added to the infrastructure. One answer is a scale-out storage architecture, like SEPATON’s S2100-ES2.
Scale-out Backup System
These systems can be deployed as a cluster of nodes, each contributing processing power that can maintain performance as capacities grow into the PB range. This grid-like architecture produces a single backup system and single management instance.
That’s one system to implement and manage, instead of a growing collection of appliances. When more backup capacity or performance is required, expansion involves adding more disk shelves or more nodes to the cluster. There’s no need to spread backup jobs between appliances to keep one from filling up prematurely and no concern over a single backup job growing too large for one appliance.
The scale-out architecture, with the ability to add capacity and processing power independently, has flexibility that the appliance format can’t match. This provides processing power that can keep up with capacity, eliminating the performance bottleneck that’s prevalent in most backup appliances.
Byte-differential Deduplication
Some of these scale-out systems also leverage a scalable deduplication technology that can grow far beyond the limits of a backup appliance. As a single system architecture, these systems can leverage a true global deduplication across all the data in the backup environment. This produces better dedupe ratios and more effective capacity per TB. So, instead of hampering backup performance as they expand, like inline deduplication can, this byte-differential, content-aware deduplication process delivers deterministic backup performance independent of data set growth, data types or change rates. These systems can also enable the user to choose which backup jobs or data sets are deduplicated, allowing encrypted data or file types that don’t dedupe well to be skipped, maximizing processor cycles.
Disk backup appliances are a good solution for many data centers; they’re simple to implement and bring a lot of functionality. But many have a fixed internal capacity so expansion means buying more appliances, which can lead to the problem of backup appliance sprawl. Scale-out backup systems, like SEPATON’s S2100-ES2 Series, can solve this sprawl problem and provide the consistent performance that backup managers can rely on, regardless of how large the environment gets. They can expand capacity easily in a single system, while maintaining deduplication ratios and keeping administrative overhead in check.
SEPATON is a client of Storage Switzerland

[…] architecture that can support from 36TB to 16PB of backup data in a single storage pool, again, no backup appliance sprawl. The system uses 3TB disk drives to reduce data center footprint and supports up to 96 x 10GbE […]