Data protection is seldom a one size fits all solution. Recovery objectives will vary based on the criticality of the application, as well as the situation in which the recovery needs to occur. Most applications will require a suite of recovery technologies to meet all the various application recovery objectives. As a result, a tiered approach to application recovery is required in order to meet these various demands. These tiers of protection should be independent for maximum resiliency and flexibility. The problem is that some backup applications are trying to offer them all in a single software package, creating a single point of failure in data protection in the process.
Types of Data Protection
Most enterprise class storage systems have multiple levels of redundancy and protection, including RAID and highly available or redundant components. Data protection is usually the process of getting data stored away from the original storage system so that data can be recovered in the case that system fails or in the event of a site disaster. There are multiple data protection techniques available to copy data out of a storage system. Each of these techniques have a specific advantage over the other given the recovery situation.
Backup Data Protection
Backup is usually the last line of defense. During a backup process data is copied through a backup application to an alternative device or devices. It should be the required foundation of any data protection strategy as it provides an isolated copy of data with a purposeful gap in time between the production data set and the backup data set. This gap in time allows for IT to identify data corruption or to recognize a viral attack on production data so that a clean copy can be recovered from the backup device.
Purpose built backup appliances (PBBAs) like EMC’s Data Domain product family, allow data to be backed up to a single disk appliance from a variety of backup applications. This allows for backups to be performed directly from within the application, providing maximum flexibility. PBBAs like EMC’s Data Domain systems also leverage deduplication and compression to cost effectively store a long term data set to disk storage, making disk a financially viable medium for storing backup data.
The backup copy of data should also be moved off-site in case a recovery is needed that is broader than a server outage or when a data corruption occurs, like a site disaster. There are two reasons for this. First, in case of a disaster the PBBA may be called on to facilitate restores of less critical servers in the environment not covered by the replication strategy outlined below. Secondarily because of the economic advantage of a PBBA, they also often serve as an organization’s archive; meaning they may have the last and only surviving copy of certain data sets.
The off-site data movement of the backup repository used to occur by physically moving the protected copies of data off-site. Now, thanks to PBBA technologies that can leverage WAN optimized deduplication, backup data can be replicated rapidly to the remote site. This replication should not be confused with the primary storage replication, which again, will be detailed below. Backup data replication remains a distinct and separate process with a specific time gap.
Primary Storage Replication
Primary storage replication creates a real-time or near real-time copy of production data. That secondary data copy is constantly being refreshed as the primary production data is updated. Its sole purpose is to provide a rapid recovery point objective in the event of a primary storage device failure or site disaster. The secondary copy can be to an alternative device either local in the data center, remotely at a disaster recovery site or most typically, a combination of the two.
The replication can occur either synchronously or asynchronously depending again on the demands of the application. Synchronous replication means that the data has to be written on the two target storage systems, potentially in two sites, prior to the acknowledgement being sent to the application; making the two copies always consistent. Asynchronous replication allows the primary storage system to be the only acknowledgement, which means that the secondary copy can lag behind the primary copy.
For mission critical applications that have a low tolerance for data corruption, synchronous replication may be required; which means an extra investment in storage systems and WAN connections that are fast enough to acknowledge the write on both sides. For most other applications, asynchronous replication, while being slightly behind the primary copy, is appropriate because it does not require the high bandwidth and low latency of a synchronous connection. This lowers the cost and allows for increased distance between the primary and the disaster recovery site.
While some integration of the monitoring and management of the backup and replication functions is ideal, it is important that the replication process and backup processes not be dependent on each other. For example, if the backup function is essentially an extended distance data replication process, this makes data protection more expensive and just as vulnerable to a cyber attack or corruption since a data problem can quickly propagate through the various copies.
There is also a greater need for replication targets to be less latent and higher performing because they need to either be in sync or close to in sync with the primary storage system they are protecting. These systems may also be called on to host the protected application and be used as production storage in the event of a hardware failure; something that a PBBA would be not be called on to do.
This combination of factors make the target device more expensive than an equivalent PBBA. It makes sense then that the replication target should only have the absolute latest copy of data on it. Ideally, the PBBA would be used for storing older versions of data since it can do so more cost effectively than a real or near real-time primary data replication store.
PBBAs like EMC’s Data Domain systems use high capacity hard drives combined with deduplication and compression to extract maximum value from their storage capacity. This allows the PBBA to store months, if not years, worth of data much more cost effectively than a replicated system.
Primary Storage Snapshots
The other form of data protection is snapshots. Most all storage arrays use something like a catalog to organize data. They create an index of where data is physically on disk. When an application needs data, the request is actually made to the index and then rerouted to the physical location on disk.
Snapshots leverage this index. When a snapshot is requested, the index is copied and all the blocks that the index points to are set into a read only mode. If a change is made to a block of data that is captured by a snapshot, the active index creates a new block of data with the change, instead of updating the old data block. The copied index then maintains the reference to the old data block.
This process allows snapshots to provide a point in time capability similar to the backup process. The major difference, however, is that snapshots are fully dependent on the underlying storage system managing them to be 100% operational. If that storage system fails, the snapshot information is lost with it. In other words, this form of data protection does not move data off of the system. Its value is in helping organizations recover from non-storage system related failures, like a database corruption.
With a trustworthy backup system in place, snapshots should be used primarily for in-between backup protection events. For example, if an application is protected once per day by the backup application, then an hourly snapshot schedule throughout the day may make sense. But those snapshots should be expired and removed when the backup process has completed. Also, since snapshots are offline copies of data, they can be effectively leveraged by the backup process to non-disruptively capture and protect a point-in-time copy of production data.
Once again, integration with the PBBA is ideal here but only to a certain degree. The snapshot is completely dependent on the primary copy of data since it leverages the array’s data index.
If a corruption of this index occurs or if the storage system itself fails, then all the snapshot copies are lost. A PBBA, on the other hand, maintains a separate standalone copy of data that is resilient to any such failure on the primary storage system.
The Value of a Tiered Data Protection Strategy
There is a hierarchy of sorts for the servers under protection in the environment and each require a different type of recovery time objective (RTO). The less time that a system can be unavailable, the more expensive the level of protection that is required. This is the value of a tiered approach. Mission critical servers which generally represent less than 1% of the server population may need synchronous replication. Business critical servers which generally represent about 10% of the servers may require asynchronous replication. The rest of the servers, 90%, can generally have their RTOs leverage a PBBA based recovery. Primary storage snapshots in turn, can also be leveraged to facilitate the rapid recovery of data following a non-storage related failure for mission critical and business critical systems. The key is to utilize all of of these tools to protect data and provide the business with the right RTO at the right cost.
Standardizing on a single application to drive backup, replication and snapshot data protection processes, could be risky and limit flexibility. A tiered approach to data protection, however, is ideal, leveraging the three protection types for their intended roles. The tiered approach also allows for the most cost effective way to manage RTOs by matching server criticality with the appropriate level of data protection.
Storage Switzerland recommends that IT professionals utilize snapshots for temporal copies of data with a short life span, leverage replication to mitigate the risk of a system failure or site disaster, and use backup expressly for the long term retention of data. A single point of management of these three processes is ideal but these processes should not be dependent on each other. Ideally, they should be able to work as stand alone entities so that a failure of one does not cause a failure of all three.
EMC is a client of Storage Switzerland