A recent blog “Healthcare overspends on long term backup retention” by Veeam’s Jonathan Butz discussed how healthcare organizations are creating overly complex and expensive backup infrastructure because they are considering backup as part of their records retention policy. Butz is right on the mark, but I’d like to expand the scope of the conversation beyond just healthcare, as most organizations of all types are overspending on backup because they are confusing backup retention with data retention.
In most cases, organizations need to recover data because the primary copy of data is unavailable. The reasons for this unavailability range from server or storage system outages to some form of data corruption. The primary objective of most recovery requests is to bring the server, storage system or application back to its last known good state. Using backup to facilitate the recovery means restoring the most recent copy of data within the backup architecture. In our experience 95% of recoveries follow this scenario, recover from the most recent backup as fast as is possible.
If backup architectures were designed with this scenario in mind then the size of backup storage capacity would be about 2 1/2 times that of production. Enough room to store two full copies plus the changes between those copies. The reality is that most backup architectures are 10X if not more than the size of production storage and the delta between the two is widening rapidly.
Why is Backup Storage So Big?
Most organizations retain backup data for years. It’s a policy that almost makes sense if the organization actively grooms its production storage, removing old datasets, both old files and old records, within databases. Countless storage audits confirm that most organizations also keep all data on their production storage for years. They also tend to replicate data from the most critical systems, to another system, off-site, for disaster recovery purposes.
While we can argue the logic in keeping all data on production storage for years, keeping it on production storage and on backup storage for years, however, is redundant. If there are records in your organization’s database application that are five or ten years old, then the moment you back that data up, your backup contains five years of retained data even though the backup is only a few moments old. Keeping unchanging data on production storage, and for an equal period of time in backup, has almost no value to the organization but it dramatically increases the cost to equip and manage the backup infrastructure.
The answer is to use the backup infrastructure for what it is good at, recovering the most recent, or fairly recent, copy of data. The point-in-time capabilities of the backup process are critical for recovering from ransomware and other problems that creep into production storage, but even in those cases, it is unlikely that the organization will access backup data older than a few days.
If the organization wants to establish a data retention strategy which truly manages data, that is a different process, which is outside of the backup architecture. Data management means moving, not copying, data from production storage after a specific period of time and then retaining that data in an archive storage area for another specific period of time. A data management strategy will reduce the cost of production storage and backup storage. It will also simplify backup operations, since the backup software no longer has to track millions of old files.
Backup applications are better at retaining data than ever before. Vendors have designed strategies to enable their backup databases to scale to track the millions of files they manage. Vendors like Veeam have implemented a tiering capability that enables organizations to save money by storing backups on less expensive object storage systems or save infrastructure investments by storing them in the cloud.
Why then does a company like Veeam also promote sanity in backup storage? Veeam is a software company; they don’t make money by selling storage the organizations shouldn’t need. However, many other backup vendors do, including some cloud backup vendors. While we think there is potential today for backup software companies to manage data better and provide archive like capabilities, IT needs to remain cognizant that backup isn’t archive as we discussed in our blog “Backup vs. Archive.”