Meeting the Five Unstructured Data Backup Requirements

Posted on April 25, 2018 by George Crump

In the last blog, we laid out the five requirements for unstructured data protection; fine-grained backups, frequent and rapid backups; cloud support, data classification, and an archiving future. Aparavi is one of the first data protection companies specifically focused on the problem of backing up unstructured data stores and they address each of the five requirements.

Striking the Balance – Fine Grained Backup vs. Rapid Backups

The trend in modern data protection, at least for solutions not focused on unstructured data, is to backup unstructured data using an image-based backup. Although, image-based backup allows these solutions to meet the second requirement of rapid backup, it leaves them unable to meet the first, fine grained backup. Aparavi uses a more traditional file-walk method to create the initial baseline of files but also creates a catalog from this walk, and then uses this catalog to check for new or modified files quickly. Unlike legacy backup solutions that walk the file system every time, Aparavi only does it once. The result is that Aparavi gets the file level detail of the file system walk method without sacrificing backup speed.

Frequent Backup is more than Just Changed Block

Image based systems backup just the changed blocks, which allows them to backup rapidly and since the backup completes quickly, those backups can occur frequently. However, all these backups need to traverse the network. Aparavi provides sub-file level backup, which enables it to provide rapid and frequent backups. But Aparavi also uses an intelligent mix of targets protecting data on the file-server first, then to an on-premises appliance and then ultimately to the cloud.

Cloud Storage

To alleviate the capacity requirement of unstructured data copies made by the protection process the solution also needs to support cloud storage but in an efficient manner. Aparavi uses cloud storage for two purposes, first as a disaster recovery copy for any on-premises data. Second, Aparavi also uses cloud storage as a tier so that organizations no longer have to continue to purchase on-premises secondary storage. Most legacy solutions only use cloud storage to create a disaster recovery copy, they do not use it as a tier to relieve on-premises storage requirements. Aparavi’s sub-file object storage and active pruning of retired data uses cloud capacity in a highly efficient manner.

Data Classification

Understanding and organizing data within unstructured data sets is critical. If data can’t be found, it might as well not be stored. Aparavi allows customers to organize data by type, size, as well as create, modification and access dates. Additionally, customers can create their own custom tags to organize data by the device that created it (cameras and IoT) or specific projects and use cases.

An Archiving Future

Archiving can describe many different processes. Historically, it is the process of making a special copy of data prior to removing the original data from production storage. The first step in creating an archive is creating that special copy, which if the backup is fine grained, is something the backup process could deliver and does normally. The next step is to classify this data so policies can be set for retention and eventual data movement. A third step is to report and provide analytics of the protected data so that IT can make decisions on what to do with it. The final step is to execute the remove (because the copy already exists) process based on those decisions, thus freeing up production storage capacity.

Aparavi has delivered on the first three steps; fine grained backup, data classification and reporting/analytics and shortly will deliver the last component, the actual removal of files from production storage. Nothing actually has to be moved again, since the backup process already sent it to cloud storage. The timing is ideal, since most organizations will want to run the data protection component and build up backup history prior to any data removal occurring.

Conclusion

The Requirements for Modern Unstructured Data Protection

For the data driven organization, unstructured data is as critical as data in production databases and today typically represents 80% or more of an organization’s total data, but modern protection of this critical asset is lacking. Given its size and criticality, organizations need to take deliberate, well considered steps to protect and manage unstructured data. They need to compare their legacy solutions to the requirements listed in blog 4 and then see if more modern solutions like Aparavi are a better fit.

Sponsored by Aparavi

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Aparavi, Archive, Cloud, Compression, Deduplication, Ransomware, Retention, Unstructured data
Posted in Blog