There are two key assumptions when it comes to purpose-built backup appliances. First, they must have inline deduplication and second, they will all be replaced by backup software solutions as they add deduplication functionality to their software and start selling their own turnkey appliances. Like most assumptions, these are probably wrong, as inline deduplication has turned out to be a hindrance, and the dominance of turnkey backup appliances is not assured.
The Inline Deduplication Burden
For years, there was a great debate between when and how to perform deduplication on an appliance. Inline deduplication means data is checked for redundancy before writing that data to the backup appliance. The alternative, post-process deduplication, is when the data is stored on the appliance first and then deduplicated later during idle time. There is a third option, adaptive deduplication, which ingest data similar to post process but starts the deduplication process immediately after the data is stored. The adaptive technique allows for immediate capacity savings similar to inline deduplication but can back off if resources become constrained.
Conventional wisdom assumed that inline deduplication was the preferred method if it did not impact backup performance. It also assumed that the only reason to use post-process deduplication was to save on processor and RAM expense. High-performance CPU and large RAM quantities were required if inline deduplication were to have negligible impact.
There are a couple of problems with the conventional wisdom. First, it only considers the impact on the backup, also known as ingest performance. It does not consider recovery performance. Large recoveries, such as full system restores or copies to tape, ARE impacted by deduplication since data needs to be reassembled while the recovery is happening.
The second problem is potentially a bigger one for inline deduplication and conventional wisdom, the advent of instant recovery features. Instant Recovery, also known as live recovery, boot from backup and in-place recovery, is a feature by which the data protection software creates a virtual volume on the appliance of protected data which an application needs for recovery, thus enabling fast, transfer-less recoveries. The feature is wildly popular, and all backup software either has added it, or is in the process of adding it.
The problem for deduplication is that the application accessing this virtual volume is in production and it is going to expect near production performance. If the volume is on a device with inline deduplication, each read of data requires the data first be reassembled. In other words, performance will likely be terrible, to the point of being unusable, negating the usefulness of this feature.
Post-process deduplication gets around both of these problems by having a separate landing zone where the latest backup is stored before it is deduplicated. Since the newest backup is the one most likely used for restores, tape copies and live recoveries, performance will be much more appropriate.
The Turnkey Dedicated Backup Appliance Problem
Conventional wisdom also states that backup software companies will eventually deliver a turnkey appliance that will include their software and enough storage to meet the backup requirement. The problem with the turnkey approach is that while it may speed installation, it severely limits flexibility. The organization has to be very sure that they will stick with the data protection solution for a long time.
The turnkey concept flies in the face of the purpose-built backup appliance, which is turnkey up to the point of software. When it comes to software, it can support multiple software solutions simultaneously. For example, if the customer wants to use one solution for their physical environment and another for the virtual environment, they can. This flexibility may become increasingly critical as the data center continues to embrace advanced concepts like containers and NoSQL database, each of which (at least initially) will have their own backup software solutions.
In fact, the industry trend continues to be away from the one software solution protects all. Instead, there is more evidence of a continual niching down with very environment-specific data protection. The purpose-built backup appliance’s ability to support multiple data protection applications makes it an ideal consolidating point for all of the protected data.
ExaGrid Update – Defying Conventional Wisdom and Moving Up Market
ExaGrid has defied conventional wisdom almost from its start. It is a scale-out backup storage system that leverages a landing zone and adaptive deduplication, making it ideal for the instant recovery era. It can also support a wide variety of data protection and archive applications. Legacy applications and new applications can both store their data on ExaGrid appliances. Deduplication occurs once the data is committed to disk but in parallel with the backup window, i.e., it has the DR RPO of inline but the ingest and restore performance of post-process.
Even its choice in scale-out methodology is tailor-made for backups. Its scale-out system is built from backup storage appliances of various sizes. That means that while a central unit manages the cluster, each node acts independently. The end result is that backup appliances can send backups to the specific appliance of their choice and that different size and age appliances can be part of the same single system.
These early choices in design plus a patient approach to the market leads to success both in the mid-market data center as well as in the enterprise. To solidify further its enterprise position, ExaGrid recently announced the EX63000E – a 4U appliance capable of storing a 63TB full backup, which is more than 58% larger than ExaGrid’s previous high-end solution, the EX40000. A full scale-out 32-appliance system equipped with EX63000Es can ingest a 2PB plus full backup and can maintain 32PB of logical data (deduplicated retention). Like the other ExaGrid models, it supports NFS, CIFS, Veeam Data Mover, Veeam SOBR, Veritas OST, and Oracle RMAN Channels.
One of the most important elements of the data protection process is completing the backup. From an ingest performance perspective, the EX63000E can ingest 13.5TB/hour per appliance, which means a 32-appliance system can ingest 432TB/hour. And remember, each appliance is addressable directly, so as long as the jobs are balanced, these ingest rates are realistic.
ExaGrid made a few design decisions early on that defied conventional wisdom; adaptive deduplication, not moving into data protection software, and a scale-out architecture with a landing zone. Now though, each of those choices seems to be very much in-step with where the rest of the industry is heading. There are still other hurdles, like navigating if, how, and when IT will integrate the cloud into their data centers, but given their track record, one can expect them to be prepared for any contingency.