The Problem with Image Backup for Unstructured Data

Unstructured data presents two challenges to the typical backup process. First, the overall volume of the data set can be the largest data set in terms of capacity that the backup application needs to protect. Second, and potentially more problematic, is the sheer number of files that the backup application needs to protect. Unstructured data can store hundreds of thousands, millions, and in some cases, billions of files. Finding files to backup that are new or have changed since the last backup can take more time than actually transferring the files to backup storage. The time it takes to scan unstructured data leads many backup vendors to use image backups, but image backups of unstructured data have problems of their own.

The Image Backup Advantage

Image backups operate a level below the file system. As a result, these backups are not impacted by how many files the file system is storing. The first image backup is a block by block copy of the volume. Subsequent image backups, assuming the backup software or operating system supports it, are block-level incremental (BLI) backups. BLI backups only transfer the blocks that change when a file is modified or when a user or application creates a new file. Both the full image backup and the BLI backup can happen in a fraction of the time that a file by file scan of the file-system takes.

The Image Backup Problem

The problem with image backups of unstructured data is that these backups lose their granular understanding of the files they are protecting. While most image-based backup solutions do provide individual file recovery, the restoration must come from a known backup job. An administrator can’t, typically, scan for a specific file across multiple image-based backup jobs. Essentially, the only individual restore requests that image-based backups are good for is a recovery from the most recent backup.

Another challenge with image backups is the removal of data from within the image, a capability that many believe is a requirement of new data privacy legislation like the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA). Among the extensive requirements to protect and retain data within these regulations are also specific requirements to remove a consumer’s personal information data based on a customer’s request, often referred to as “the right to be forgotten.”

Image backups need all the copies blocks of the volume to be available. Removal of one block from that backup corrupts the file, and the entire backup becomes invalid. Some vendors argue that as long as the backup application removes data belonging to a “forgotten” user as it is restoring other data, then the application complies with the regulations. At this point, however, there is no case law to support that point of view, and for the most part, that point of view conflicts with the regulations.

What Should IT Professionals Do?

IT managers need to reconsider how they are protecting unstructured data. It is a data set that is not only growing in size, it is also increasing in criticality to the organization. IT probably needs to make investments in either advanced, high-speed file by file technology or in the more aggressive use of archive technologies. In our next entry, we’ll discuss the pros and cons of file-by-file backup and how vendors can improve its performance and capabilities to make it a more viable option for unstructured data protection.

Storage Switzerland and Aparavi recently recorded an in-person presentation called, “Are you Treating Unstructured Data like a Second Class Citizen.” You can watch an on demand version of that presentation by registering here. Attached to the presentation is an exclusive white paper entitled “It’s Not IF your Backup Software is Using Cloud Storage, It’s HOW!” In the article, we cover another challenge with image-based unstructured data backup, inefficient use of cloud storage, which needlessly forces the growth of the on-premises data storage footprint. As soon as the presentation starts playing, you can download this valuable asset.

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , , , ,
Posted in Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,246 other followers

Blog Stats
  • 1,564,419 views
%d bloggers like this: