Backup is not Archive

In order to protect their data while dealing with explosive data growth, many organizations have started backing up their data to the cloud in an effort to reduce their storage and data center costs as well as obtaining data redundancy without the need to maintain a separate physical DR site. Many also mistakenly believe that these additional backup copies qualify as archive copies. Unfortunately, they do not.

Backup vs. Archive

As we discussed in our article, What Is Archive Anyway?, there is a big difference between a backup copy and an archive copy.

A backup is the recurring, systematic copying of active data, which is being frequently accessed and modified, in order to preserve its active content. These backup copies are made at regularly scheduled intervals so that this active data can be restored in the event of a system failure, file overwrite or file deletion, whether deliberate or accidental. This active data is usually stored on tier one storage, which consists of flash and/or high-speed disk. The backups are also usually kept on tier 2, cost effective but reasonably performing storage to insure rapid response to restore requests or to provide acceptable performance if the backup file or snapshot is mounted in a VM (virtual machine).

An archive, however, is a static backup copy of groups of older inactive data that is not needed for daily operations. This inactive or cold data is not modified and is only accessed occasionally for historical reference or not at all. Typically, data is considered as cold data if it has not been accessed or modified in over 90 days. The challenge with the archive data set is that no one knows which component of it will be accessed and when that access will occur. While response time to a request from an archive set does not need to be instantaneous, it does need to be responsive.

The archive copy is normally created only when the data has not been modified or accessed for a specific period of time that is defined by the administrator or business unit manager. These archive copies are stored indefinitely on less expensive media that is not in the backup path. There are two components to the archive process. Typically, there is an application or applications that identify and move data from the active tier to the archive tier. There are also multiple storage hardware targets, typically some combination of disk, tape and cloud.

The combination of multiple archive sources and multiple archive targets has historically made the process of tiering data to less expensive storage both expensive and extremely complex. In an effort to avoid this complexity, data centers have instead continued to expand production storage and to count on backup to facilitate an archive like function. The result has been an even more expensive alternative.

Appliances are now appearing on the market to consolidate the archive mess by integrating the archive software and abstracting the management of multiple archive components. The result is less expensive primary storage and a more simplified data protection process.

How Archive Complements Backup

If you closely examine the differences between backup and archive functions, it becomes readily apparent that archive is actually an integral part of the backup process with each function protecting different types of data.

Backup protects active data while archive protects cold data. However, archive goes beyond just protecting cold data. It also enhances and simplifies the backup process as well as freeing up expensive primary storage.

Consider some basic demands of the backup process itself:

  • Time to backup target data, both active and inactive
  • Amount of primary storage required to store the backup copies and snapshots
  • Amount of primary storage required for the backup application database that tracks all files it backs up and media it manages

A good archive solution will automatically identify and migrate cold data from expensive primary storage to less expensive secondary or tertiary storage tiers, like tape. The archive process yields the following benefits:

  • Reduces amount of data that needs to be backed up
  • Reduces the time needed to perform backup operations
  • Reduces the size of the backup application database since it has to track fewer files
  • Reduces the size of the backup storage repository
  • Frees up expensive primary storage, which avoids the necessity for purchasing additional storage or the need for IT personnel to manage additional disk or appliances

As storage capacity demands continue to grow uncontrollably, the current strategy of scaling primary storage and leveraging backup as an archive becomes untenable. The cost benefits of a good archive solution should not be underestimated. All these facts make it clear that archive should be an integral part of a comprehensive storage strategy that includes a small but high performing active storage tier, a slightly larger and modest performing data protection tier and a high capacity, cost effective archive tier.

Sponsored by FujiFilm Dternity Powered By Strongbox

Joseph is a Lead Analyst with DSMCS, Inc. and an IT veteran with over 35 years of experience in the high tech industries. He has held senior technical positions with several major OEMs, VARs, and System Integrators, providing them with technical pre and post- sales support for a wide variety of data protection solutions. He also provided numerous technical analyst articles for Storage Switzerland as well as acting as their chief editor for all technical content up to the time Storage Switzerland closed upon their acquisition by StorONE. In the past, he also designed, implemented and supported backup, recovery and encryption solutions in addition to providing Disaster Recovery planning, testing and data loss risk assessments in distributed computing environments on UNIX and Windows platforms for various OEM's, VARs and System Integrators.

Tagged with: , , , , , ,
Posted in Blog
8 comments on “Backup is not Archive
  1. Brad Karlberg says:

    Why is there no serious effort on expiration dates for data? We love selling more HW but at some point we may need to do something other than riding the wave. ILM was in vogue for a very short period of time. Anyone interested in taking this on?

  2. Ed says:

    Heard a quote once, using it ever since: “Backup for recovery, archive for discovery.” So true. A lot of people can’t tell the difference and end up clogging their backup infrastructure with unecessary and often unrecoverable data.

  3. Las says:

    Good article!

  4. […] Source: Backup is not Archive | – The Home of Storage Switzerland […]

  5. […] Oh, and one final comment… some might say backup is another category of archiving.  Despite being sponsored content, I like what the folks at Storage Switzerland says about backup vs archive. […]

  6. […] Despite being sponsored content, I like what the folks at Storage Switzerland says about backup vs archive.   […]

  7. I ran deduped SQL backups for almost a year in parallel with tape backups. In the end we dumped the dedupe system and spent $$$ to upgrade to LTO-4 tape. the dedupe had some amazing compression and it was a lot better than tape. but LTO-4 tapes are $55 for 1.6TB tapes which in the real world come out to almost 3TB of storage per tape.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,785 other followers
Blog Stats
%d bloggers like this: