In order to protect their data while dealing with explosive data growth, many organizations have started backing up their data to the cloud in an effort to reduce their storage and data center costs as well as obtaining data redundancy without the need to maintain a separate physical DR site. Many also mistakenly believe that these additional backup copies qualify as archive copies. Unfortunately, they do not.
Backup vs. Archive
As we discussed in our article, What Is Archive Anyway?, there is a big difference between a backup copy and an archive copy.
A backup is the recurring, systematic copying of active data, which is being frequently accessed and modified, in order to preserve its active content. These backup copies are made at regularly scheduled intervals so that this active data can be restored in the event of a system failure, file overwrite or file deletion, whether deliberate or accidental. This active data is usually stored on tier one storage, which consists of flash and/or high-speed disk. The backups are also usually kept on tier 2, cost effective but reasonably performing storage to insure rapid response to restore requests or to provide acceptable performance if the backup file or snapshot is mounted in a VM (virtual machine).
An archive, however, is a static backup copy of groups of older inactive data that is not needed for daily operations. This inactive or cold data is not modified and is only accessed occasionally for historical reference or not at all. Typically, data is considered as cold data if it has not been accessed or modified in over 90 days. The challenge with the archive data set is that no one knows which component of it will be accessed and when that access will occur. While response time to a request from an archive set does not need to be instantaneous, it does need to be responsive.
The archive copy is normally created only when the data has not been modified or accessed for a specific period of time that is defined by the administrator or business unit manager. These archive copies are stored indefinitely on less expensive media that is not in the backup path. There are two components to the archive process. Typically, there is an application or applications that identify and move data from the active tier to the archive tier. There are also multiple storage hardware targets, typically some combination of disk, tape and cloud.
The combination of multiple archive sources and multiple archive targets has historically made the process of tiering data to less expensive storage both expensive and extremely complex. In an effort to avoid this complexity, data centers have instead continued to expand production storage and to count on backup to facilitate an archive like function. The result has been an even more expensive alternative.
Appliances are now appearing on the market to consolidate the archive mess by integrating the archive software and abstracting the management of multiple archive components. The result is less expensive primary storage and a more simplified data protection process.
How Archive Complements Backup
If you closely examine the differences between backup and archive functions, it becomes readily apparent that archive is actually an integral part of the backup process with each function protecting different types of data.
Backup protects active data while archive protects cold data. However, archive goes beyond just protecting cold data. It also enhances and simplifies the backup process as well as freeing up expensive primary storage.
Consider some basic demands of the backup process itself:
- Time to backup target data, both active and inactive
- Amount of primary storage required to store the backup copies and snapshots
- Amount of primary storage required for the backup application database that tracks all files it backs up and media it manages
A good archive solution will automatically identify and migrate cold data from expensive primary storage to less expensive secondary or tertiary storage tiers, like tape. The archive process yields the following benefits:
- Reduces amount of data that needs to be backed up
- Reduces the time needed to perform backup operations
- Reduces the size of the backup application database since it has to track fewer files
- Reduces the size of the backup storage repository
- Frees up expensive primary storage, which avoids the necessity for purchasing additional storage or the need for IT personnel to manage additional disk or appliances
As storage capacity demands continue to grow uncontrollably, the current strategy of scaling primary storage and leveraging backup as an archive becomes untenable. The cost benefits of a good archive solution should not be underestimated. All these facts make it clear that archive should be an integral part of a comprehensive storage strategy that includes a small but high performing active storage tier, a slightly larger and modest performing data protection tier and a high capacity, cost effective archive tier.
Sponsored by FujiFilm Dternity Powered By Strongbox