Archive is Better than Backup

Posted on October 18, 2018 by George Crump

For decades data center best practices were to isolate backup and archive. “Backup is not archive” was the mantra. The reality is most data centers ignored the mantra and used their backup process for all of their data retention. Today, however, as data centers deal with unprecedented growth in their unstructured data sets it may be time to change the mantra to “archive is better than backup.” Organizations need to change their approach and take an archive first approach to data protection.

Is Backup Dead?

Backup is a process that copies data from primary storage to some form of secondary storage. As unstructured data continues to grow organizations have modified their software to perform an image backup of the entire file system instead of a file by file backup. Subsequent backups only need to backup changed blocks, further lowering backup times.

The problem with the image approach is that, while vendors provide individual files restores, the administrator needs to know exactly which backup set contains the file they are looking for. The software doesn’t provide a way for organizations to search for specific files across backup sets.

The lack of granular file knowledge creates challenges for organizations looking to maintain compliance with data privacy legislation like the European Union’s General Data Protection Regulation (GDPR) or California’s Consumer Privacy Act. The lack of specific file knowledge means that complying with “right to be forgotten” sections of the legislation are very difficult.

Another challenge is image-based backups are difficult to archive, which makes it difficult to lower secondary storage costs. The baseline image must always be available to compare subsequent blocks. Most software solutions using block-level incremental backup also have a limit on how many iterations away from the baseline image they can track before performance is impacted.

The Archive Alternative

An alternative is to copy all unstructured data to an archive and let archive software manage the data. Most archive solutions have a very specific understanding of every file and every version of every file in the archive. Finding and removing data in response to a “right to be forgotten” is straightforward. Archive solutions can also manage where the archived data is stored and most will support multiple storage tiers so that the older the data is moved to less expensive tiers and becomes the less expensive to store. These solutions can also remove old data from primary storage, lowering primary storage costs.

The Archive Problem

The problem with the archive is how to get ALL data to it, quickly and consistently. Many archive solutions don’t have a transfer mechanism: they count on the administrator to manually move data to it. Some archive solutions do have an automatic file transfer capability but they were not designed to transfer files en masse like backup solutions are. Also, archive solutions typically have no communication path to the backup process. For example, it can’t confirm that the backup process has X number of copies of a file prior to removing that file from primary storage.

Integrating Backup and Archive

The solution is to integrate backup and archive. Vendors need to create solutions that perform high-speed file by file backups so data is protected. Then they need to add an archive function that builds a rich metadata index of data under management so that it is easy to search for data within the repository. An archive function enables the movement of data across secondary repositories and eventually from primary storage driving down costs. IT can move old data from primary storage with the confidence of knowing it is protected X number of times.

The integration of backup and archive enables the organization to protect their data, comply with data privacy regulations and reduce their investments in both primary and secondary storage.

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Aparavi, Archive, Cloud, Data Privacy, GDPR, Retention, Unstructured data
Posted in Blog

One comment on “Archive is Better than Backup”

Sedot WC Bandung Putra says:

November 5, 2018 at 9:11 pm

Great blog here! Additionally your web site
a lot up very fast! What web host are you the use of?
Can I am getting your affiliate hyperlink in your host?
I wish my website loaded up as fast as yours lol

Comments are closed.