The European Union’s (EU) General Data Protection Regulation (GDPR) has contributed to an overall greater awareness among both consumers and enterprises of the value of data. As discussed in a previous Storage Switzerland blog, the value of and requirement for granular data privacy control is a universal need that extends far beyond Europe specifically.
For storage managers, meeting more demanding and granular data protection and retention requirements necessitates a smart data management strategy. Specifically, the ability to retain or delete specific files and sets of files becomes a key pain point against the tide of unstructured data growth.
GDPR is popularizing the concept of “the right to be forgotten” – that is, the ability for an individual to request at any time that their data no longer be stored by a company. The concept is straightforward, and it is relatively easy to remove individuals’ data from primary storage. The same cannot be said when it comes to removing individual user data from backup storage, however, primarily due to the rise of unstructured data as well as the media types and backup application formats used to store that data.
In contrast to the rigid format of traditional structured data, such as that generated by financial systems, unstructured data is irregular; it does not have a pre-defined model, and it is generated across a variety of forms. Years ago, unstructured data comprised a small portion of the enterprise’s data stores and was not strategic; today, that equation has been flipped on its head. For example, companies are seeking to capture and analyze machine-generated log files from mobile applications to better understand application performance and customer behavior patterns. Another example is the capture and utilization of text from a variety of social media platforms and emails to understand customer preferences and activity.
The need to store and analyze vastly growing amounts of unstructured data has contributed to greater utilization of image-level backup jobs, whereby the storage volume itself is backed up as opposed to individual files. Common job-based backups do not tag data in a manner that is logical and easy to understand, and companies often prune metadata that could help to tag and identify individual files, in order to reduce the amount of data that must be stored. This creates a substantial headache for storage managers seeking to access individual files; they typically need to know which backup job contained the file in question, and it is very difficult to search backup and archive data. Millions of discrete files may need to be searched or managed, especially for an enterprise dealing with hundreds or thousands of “right to be forgotten” requests.
Part of the pain inherent in adhering to more sophisticated data privacy regulations including “the right to be forgotten” lies in common practices of storing too much data on backup storage that was not designed to make it easy to locate and remove individual files.
To address this challenge, storage managers should consider more closely integrating backup and archive policies via intelligent data management and tiering. Approximately 80-90% of restores happen from the most recent backup, creating the opportunity for a substantial majority of the enterprise’s copy data to be migrated to an archive storage infrastructure. Fundamental to making such a practice possible is applying rich metadata and policy-based migration policies to enable file-by-file backup, tiering to archive, and restore while keeping backup windows in check.
Access Storage Switzerland’s webinar with Aparavi, “Data Management vs GDPR and Data Privacy-Solve the Right to Be Forgotten Problem,” on demand to learn more.