Most data protection applications are not backing up unstructured data correctly. The massive increase not only in the capacity of unstructured data but also in the number of files represented within the data set makes protecting it difficult. The large total size and the high number of files force most data protection vendors to move to an image-based data protection method to capture this data in a timely manner. Image-based protection enables the vendor to get around the number of files issue but severely limits its visibility into the data. This lack of insight into the contents of the data may cause organizations to be in violation of compliance regulations.
For years, most data protection applications protected unstructured data storage platforms like file servers and network-attached storage (NAS) one file at a time. The software would log into the file system and scan it from top to bottom, finding files that were new or modified since the last backup. The file-by-file approach was slow to begin with, but as the number of files started to increase, the time required to protect unstructured data became untenable.
To work around the file-by-file issue, most vendors resorted to protecting data at an image level. Image level backups work by creating an image of the entire volume on a backup device and then as blocks change on the original volume, they update the image on backup storage. Most image-based backups can still provide single file restores, but the customer needs to know precisely which backup set has the data they are after.
The result is a loss in the visibility of files across the backup set. With image-based backup methods, there is no way to find all the files created or modified by a particular user over a given time. There is also no way to build a content level index of the data contained within these files. Consequently, there is no way to identify personally identifiable information (PII) data within the backup set.
The lack of context of the protected unstructured data makes it difficult for IT to fulfill the most common restore request, the restoration of user files that are accidentally deleted or overwritten. Users often don’t remember the exact name or location of a file, and they have no insight into when the backup software completed the last good backup of a particular file. This seemingly simple restore request can take hours of hunting and pecking to find the right file for recovery, wasting valuable IT time.
Image-based backups are similar to a house of cards, with each block being a card in the house. A card can’t be removed. If a block of data is deleted, the entire image is corrupted. The ability to extract data from the backup set is increasingly essential because of regulations like the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
Organizations Need Unstructured Data Backup Visibility
Data protection vendors’ choice of image-based unstructured data protection instead of improving file-by-file backups is causing challenges for organizations trying to meet regulations like GDPR or even to fulfill internal recovery requests. Image-based backups are ideal for protecting applications and virtual machines, where per file visibility isn’t needed, but for unstructured data, IT planners need to look for purpose-built protection. These solutions run alongside the existing solution, which continues to protect the application and virtual environment.
The purpose-built data protection solution should improve file-by-file backup performance, especially after the initial scan, so that performance is not an issue. These solutions should also provide more than metadata “basics” like file owner, create date and modify date. Instead, the purpose-built unstructured data solutions should provide complete context level search and data classification. This combination enables IT professionals to be able to remove data from within backup set if a regulation requires it. It also enables better understanding, and locating PII data within the backup to better facilitate the most common restore requests.
To learn more about improving your unstructured data protection capabilities, watch our on demand webinar “Are you Treating Unstructured Data as a Second Class Citizen“.