Are Snapshots Enough to Protect Unstructured Data?

Posted on July 10, 2018 by George Crump

Unstructured data is hard to protect. It is growing at alarming rates. It is a crucial target of cyber-threats like ransomware. Even the makeup of the data is problematic. Unstructured data is often made up of millions, and in some cases billions, of files. In an attempt to overcome these challenges many data centers now count on snapshots to protect unstructured data. The problem is that snapshots, while they have a role to play, are not well suited to meet the requirements of unstructured data.

The Single-Point-of-Failure Problem

The first problem when using snapshots to protect data is that the snapshot occurs on the same storage system as the production copy of data. A failure on the storage system means not only the loss of production data but also the loss of all the “backup” copies. While snapshots are fine for a quick copy of data, it is critical to use the snapshot to create a copy of that data on a secondary system.

The Search Problem

The second problem when using snapshots to protect data is a search problem. While most storage systems on the market today can manage hundreds if not thousands of snapshots, almost none of those systems provide any form of granular search within the snapshot. Snapshots are well suited to restore the latest protected copy of data requested but not to fulfill a request to find the 5th version of a file that is four weeks old.

The Snapshot Capacity Problem

Snapshots, when first executed, take almost no additional storage capacity because the only thing copied is volume or filesystem metadata. As the snapshot ages, however, if the system, one way or another, has to track the various changes, then data consumption does increase. Week or month old snapshots, especially on very active filesystems can consume quite a bit of disk capacity. The problem is that the actual capacity consumption of an old snapshot is very hard to track and even harder to predict.

The Snapshot Integration Problem

The final problem with snapshots is that their integration with other components of the data protection process is limited. While some backup software can trigger a snapshot, backup the data from the snapshot and then delete the snapshot, the capability is rare and often found in only the most high-end backup solutions. Moreover, even these solutions only integrate with a handful of data protection software.

Use Snapshots for what They Were Designed

It was not the intention for snapshots, as a technology, to be a long-term backup. Instead, the intent was to use them, as the name implies, as a short-term representation of data at that moment in time. Instead, the system should quickly copy them to another data protection system and delete the original snapshot.

Modernizing Snapshots

Snapshots should be more than an external integration between two separate solutions. Instead, the backup solution should “be” the snapshot and fully manage it. That snapshot should then feed the backup solutions secondary storage solution, which should then tier to a cloud-based storage repository and set up the organization to archive on-premises production data.

To learn more about the new requirements of unstructured data protection watch our latest on demand webinar “The Three New Requirements of Unstructured Data Protection“. Attendees get immediate access to Storage Switzerland’s exclusive eBook “Modernizing Unstructured Data Protection and Management“.

Watch On Demand

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Aparavi, Archive, Cloud, GDPR, Retention, Snapshot, Unstructured data
Posted in Blog