The focus of backup is now on unstructured data instead of databases. Indeed, protection of databases is as important as it ever has been, but the backup process is probably not the right way to protect that data. Organizations have chosen different methods to protect database data including snapshots and replication. Unstructured data is now the new problem child in the data protection process and maybe it’s time to do something other than a backup to protect that data.
The Changing Landscape of Unstructured Data
Unstructured data is no longer just user home directories full of spreadsheets and documents. It has changed dramatically. Unstructured data now consists of massive amounts of digital images and video as well as IoT data from devices and log data from systems. All this data needs to be readily accessible from the applications that may analyze it. Recreation or rekeying of data, in many cases, isn’t possible since it represents data at a specific point in time.
User data has also changed. In addition to more documents, those documents are also richer with embedded graphics and media. Users also collaborate on their data with external users and business partners, which leads to the creation of more extensive and more elaborate documents.
The Impact of Unstructured Data Changes on Backup
The changes to unstructured data make it more difficult for traditional backup technologies to protect it. From both a capacity standpoint and a quantity standpoint, there is now more unstructured data than ever. The growth in unstructured data forced many data protection software solutions to move to an image-based backup. With an image-based approach, they protect unstructured data as one big blob instead of millions of little files.
The problem with the image based backup approach is it loses the granularity about files it is protecting. While most backup applications do claim the ability to perform a single file restore, they do so by mounting the image that contains the file needed for the restore. The lack of granularity means the backup application loses the ability to search for specific files or operate on a group of files across backup jobs.
The lack of granularity is cause for concern for IT professionals. Many of them also count on the backup process to be the long-term archive as well. The lack of file granularity severely limits how the organization can manage its unstructured data.
The reason that most backup solutions moved to image-based backup has more to do with the number of files than concerns over capacity. Before image-based backups, backup solutions had to scan the file system manually, looking for files that needed protection. This process, also known as a file system walk, is very time-consuming, to the point that it takes longer to perform the walk and identify specific files needing protection than it does simply to backup all the files. Image backup takes it a step further by backing up the volume bit by bit.
The solution in part is to return to backup’s roots, walking the file system to get a granular understanding of the files it contains. The vital aspect and weakness to address is that traditional backup performs each file system walk independently of the prior walk.
A modern, unstructured data protection solution should build a history so it can more quickly scan the file system and identify new or changed files. It works similarly to the way a file system journal does. The result is rapid backups that meet today’s recovery time and recovery point objectives but also provide the granularity required to provide long-term retention and data management.
One of the challenges with selecting a backup solution for a specific environment, like unstructured data, is the organization can create both a data protection sprawl problem. With data protection sprawl, each environment gets its backup application, leaving IT with a management nightmare.
The reality is the data center has always counted on multiple solutions to provide blanket coverage for the organization. What is important is that IT doesn’t let sprawl get out of hand. Providing complete coverage by managing an extra data protection application or two to is reasonable. Managing five or six applications are not. IT just needs to try to draw the line as to what is essential and reasonable. Unstructured data, given its importance, is one of those instances.
To learn more about the new requirements for unstructured data protection watch our latest on demand webinar “The Three New Requirements of Unstructured Data Protection“. Attendees gain immediate access to Storage Switzerland’s exclusive eBook “Modernizing Unstructured Data Protection and Management.”