The Requirements for Modern Unstructured Data Protection

Posted on April 23, 2018 by George Crump

Unstructured data presents two challenges that organizations need to deal with; the sheer volume of data and the quantity of files in the data sets. Storing this data is a problem in and of itself, but protecting it is an entirely new problem, which most legacy data protection solutions are ill equipped to handle. A new wave of data protection solutions is on the way to the data center, IT planners need to make sure they understand modern unstructured data protection requirements to see if these new solutions are up to the challenge.

Requirement 1 – Fine Grained Backups

The first requirement of a modern data protection solution is to provide fine grained backups. Most legacy backup solution have tried to work around the file quantity issue, discussed in chapter 1, by doing image based backups. While it’s true that image based backups, especially when combined with changed block tracking technology, are a fast efficient way to backup millions of files, image based backups lack the fidelity needed to manage that data.

To recover an individual file from an image based backup requires that image be mounted, examined and an individual file or files extracted from it. If the administrator knows exactly which file, they are looking for and which backup job contains the version of the file they want, then recovery is relatively straightforward. The reality is though that most unstructured data recoveries look nothing like this. Most of the time a request to recover unstructured data is more like restoring all data related to project X or restore the third version of this file but without knowing which backup job contains that version.

The modern unstructured data protection solution needs to backup and store data so that a recovery request can search across all the files and all the protection instances.

Requirement 2 – Frequent and Rapid Backups

Unstructured data changes frequently throughout the day and especially in the modern data center, terabytes of new information can be added to the unstructured dataset within hours. Much of this data can’t be recreated as it is the recording of conditions at a specific date and time. Unstructured data is also particularly vulnerable to user error and cyber attacks like ransomware.

Because of this vulnerability, protection of this new and updated data needs to occur more frequently than the typical once per night backup. But, that backup frequency can’t break the first requirement of fine grained backup detail. The problem is that typically the only other alternative to image backup is a slow walk of the file system that identifies data requiring protection. In an era where millions of files are commonplace, a file system walk approach is impractical.

The modern unstructured data protection solution needs to deploy via a driver or agent that resides on the protected file-server or interfaces with the NAS API. After the initial backup is complete, the solution needs to create and manage a journal like system in order to quickly identify and protect modified files within seconds, throughout the day.

Requirement 3 – Cloud Support

Secondary or protection storage is typically 5X the size of production storage. Given the current capacities and growth rate of unstructured data, the floor space requirements of the protection storage infrastructure may require its own data center. A third requirement for modern unstructured data protection is to provide the option to leverage cloud storage as the secondary data store. The approach should be hybrid so that some of the data can be stored on-premises, for rapid recoveries of the most recently modified data, while older data is stored in the cloud for cost effective, long-term storage.

Unstructured Data Protection Should Integrate Archive

Requirement 4 – An Archiving Future

While data protection is the immediate battle for the unstructured data, data management is the war. A fourth requirement is that unstructured data protection solutions lay the groundwork for an archiving future where data can be migrated from primary storage to less expensive storage. Integration of archiving with data protection makes sense, since policies can be architected to make sure that data is not removed from production storage until, not only has it not been accessed for a specified period of time, but also that the data has been protected (copied) a specific number of times.

The first three requirements are not only necessary for unstructured data protection, they are also the necessary foundation for the fourth requirement, archiving. Without it, integrating archive doesn’t make sense and is the reason that for years we’ve been told that backup and archive are two separate processes.

Conclusion

It comes as no surprise to IT professionals that unstructured data is dramatically different in capacity, quantity and how it is used, than in years past. Remarkably, the attitude towards protecting and managing unstructured data has not changed. As unstructured data continues its meteoric growth path, it is time to rethink how to protect and manage it. An unstructured data protection solution that meets these requirements will not only position the organization to protect this data but also to manage it.

Sponsored by Aparavi

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Aparavi, Archive, Cloud, Compression, Deduplication, Ransomware, Retention, Unstructured data
Posted in Blog