Unstructured data has been a part of the data center since there were data centers. However, over the last decade, unstructured data has fundamentally changed. What was once data consisting of office productivity documents created by users is now data created mostly by machines. Unstructured data now consumes the bulk of data center storage capacity, and it is becoming increasingly critical to the organization. Despite these dramatic changes, the method for protecting this data hasn’t kept pace.
Current Unstructured Data Protection Methods Fall Short
Today most data centers protect unstructured data in one of two ways. They either use the legacy method known as “walking the file system” to individually inspect each file in a file system to determine if it requires backing up and then copying new or modified files to backup storage. The problem with the individual inspection approach is that it is slow; it can take longer to inspect the files than it does to copy them blindly to backup storage. Some organizations find it is faster to just backup all of the data, regardless of its status, than it is to identify the new data.
A newer and increasingly more common method to protect unstructured data is for the backup software to create an image backup. This method backs up the file system as a single entity, an image. Afterwards, it copies only the changed or added blocks of data, shrinking the size of the backup and the time required to protect the data. While the image method provides speedy backups, it lacks the granularity of the individual file inspection approach and may be useless as data privacy laws like GDPR become more common.
Next Generation Unstructured Data Protection
Requirement #1: Rapid, Granular Backups
The first requirement of protecting unstructured data is to ensure frequent and rapid backups while at the same time maintaining a granular understanding of file information. Unstructured data is a favorite target of ransomware attacks. Machine generated unstructured data often can’t easily be re-created and infection or loss often means permanent loss. The problem is that rapid backups and granular backups are at odds with each other when using traditional techniques like individual file inspection and image backups.
The next generation data protection solution needs to use a logging type of approach so that after the initial scan and backups are complete, the software can detect new or modified files without having to re-inspect each file. The result is a method that provides regular and high-speed backups without losing file granularity.
File granularity is critical to the ongoing management of data. It allows an organization to apply specific backup policies to specific data sets or file types. It also empowers the requirement for data intelligence.
Requirement #2: Data Intelligence
Data intelligence is leveraging the file granularity feature required in requirement one. The unstructured data protection software needs to catalog each file and each version of each file and then organize it for easy recall in the future. Modern, unstructured data protection should do more than capture essential metadata like date modified and date changed information. It should also provide detailed metadata and custom tagging so users can organize it themselves.
The unstructured data protection solution should also use data intelligence to provide data archiving, the moving of inactive (cold) data from primary storage to secondary storage or the cloud. This critical capability provides the ability to shrink not only the size of on-premises secondary storage but also, eventually, on-premises primary storage.
Requirement #3: The Cloud
One of the more significant problems with unstructured data is the rate of growth. Keeping up with the growth of unstructured data is, of course, a challenge for primary stores. The backup storage for unstructured data is expected to be 5X to 10X its primary structured counterpart. Data centers could quite literally see the storage supporting unstructured data protection consume the vast majority of data center floor space.
For many organizations, the only viable location for this data is the cloud. The problem is the most current data protection solutions only use the cloud to keep a DR copy of data. This means storing 100% of the data on backup storage in the data center and storing 100% of that same data in the cloud. As a result, these solutions do little to alleviate the on-premises capacity problem.
Instead, modern solutions need to, at minimum, archive older backups from on-premises to the cloud, keeping on-premises data protection storage to a minimum. The software should allow the organization to remove previously backed up on-premises secondary data to the cloud. Eventually, after the creation of enough cloud copies, the modern, unstructured data protection solution should allow the organization to remove data from on-premises primary and secondary storage, freeing up its capacity for other use cases and reducing the need for future purchases.
Unstructured data has changed. Unstructured data used to be the data that an organization protected after it protected all the “important” database data. Now though many organizations have determined that unstructured data is at least as critical as the data in those databases. The problem is the organization’s backup software hasn’t kept pace. It’s time for IT professionals to re-visit data protection software and see if it can deliver the three key requirements; rapid and granular backups, data intelligence and full support of cloud storage.
To learn more about unstructured data protection and how to modernize the process, watch our on demand webinar “The Three New Requirements of Unstructured Data Protection”.