Backup Retention vs. Data Retention

A recent blog “Healthcare overspends on long term backup retention” by Veeam’s Jonathan Butz discussed how healthcare organizations are creating overly complex and expensive backup infrastructure because they are considering backup as part of their records retention policy. Butz is right on the mark, but I’d like to expand the scope of the conversation beyond just healthcare, as most organizations of all types are overspending on backup because they are confusing backup retention with data retention.

In most cases, organizations need to recover data because the primary copy of data is unavailable. The reasons for this unavailability range from server or storage system outages to some form of data corruption. The primary objective of most recovery requests is to bring the server, storage system or application back to its last known good state. Using backup to facilitate the recovery means restoring the most recent copy of data within the backup architecture. In our experience 95% of recoveries follow this scenario, recover from the most recent backup as fast as is possible.

If backup architectures were designed with this scenario in mind then the size of backup storage capacity would be about 2 1/2 times that of production. Enough room to store two full copies plus the changes between those copies. The reality is that most backup architectures are 10X if not more than the size of production storage and the delta between the two is widening rapidly.

Why is Backup Storage So Big?

Most organizations retain backup data for years. It’s a policy that almost makes sense if the organization actively grooms its production storage, removing old datasets, both old files and old records, within databases. Countless storage audits confirm that most organizations also keep all data on their production storage for years. They also tend to replicate data from the most critical systems, to another system, off-site, for disaster recovery purposes.

While we can argue the logic in keeping all data on production storage for years, keeping it on production storage and on backup storage for years, however, is redundant. If there are records in your organization’s database application that are five or ten years old, then the moment you back that data up, your backup contains five years of retained data even though the backup is only a few moments old. Keeping unchanging data on production storage, and for an equal period of time in backup, has almost no value to the organization but it dramatically increases the cost to equip and manage the backup infrastructure.

The answer is to use the backup infrastructure for what it is good at, recovering the most recent, or fairly recent, copy of data. The point-in-time capabilities of the backup process are critical for recovering from ransomware and other problems that creep into production storage, but even in those cases, it is unlikely that the organization will access backup data older than a few days.

If the organization wants to establish a data retention strategy which truly manages data, that is a different process, which is outside of the backup architecture. Data management means moving, not copying, data from production storage after a specific period of time and then retaining that data in an archive storage area for another specific period of time. A data management strategy will reduce the cost of production storage and backup storage. It will also simplify backup operations, since the backup software no longer has to track millions of old files.

Conclusion

Backup applications are better at retaining data than ever before. Vendors have designed strategies to enable their backup databases to scale to track the millions of files they manage. Vendors like Veeam have implemented a tiering capability that enables organizations to save money by storing backups on less expensive object storage systems or save infrastructure investments by storing them in the cloud.

Why then does a company like Veeam also promote sanity in backup storage? Veeam is a software company; they don’t make money by selling storage the organizations shouldn’t need. However, many other backup vendors do, including some cloud backup vendors. While we think there is potential today for backup software companies to manage data better and provide archive like capabilities, IT needs to remain cognizant that backup isn’t archive as we discussed in our blog “Backup vs. Archive.”

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , , ,
Posted in Blog
2 comments on “Backup Retention vs. Data Retention
  1. StoneFly Inc says:

    Data retention is the process by which a business decides when it is the appropriate time to delete any given piece of information. Some data, such as corporate bylaws and legal documents, may need to be retained indefinitely. However, other data may need to be removed from business systems sooner for reasons of cost containment, limitation of liability and maintaining sane operations for your business.

    The purpose of data backups is to prevent failures in your data retention plan. Commonly a data backup will be used to:

    Restore a document which was accidentally or improperly deleted by an employee Restore a document which was corrupted or damaged due to a hardware or software failure Restore a document which was corrupted or stolen by a malicious party or hacker (e.g. ransomware)

  2. StoneFly Inc says:

    A complete backups and disaster recovery system is Require for busines to backupa and save tha data.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25.5K other subscribers
Blog Stats
  • 1,939,818 views