Unstructured Data is distracting Backup Administrators

File based data accounts for more than 80 percent of capacity demand and backup administrators spend most of their time protecting this unstructured data. But the remaining set, structured data, will cause the organization the most harm if it is not recoverable. This data (databases, VM Images) requires special backups and fast recoveries. The key to protecting the organization from disaster is to eliminate the unstructured data protection problem. If backup administrators could focus 100% of their time on 20% of their problem, then organizations would be in a much better position to protect themselves from a disaster.

The Importance of Unstructured Data

For many organizations, unstructured data is just as mission critical as their databases and other applications. But that protection does not need to be continuous, nor does this data need to be instantly recoverable. Still, backup administrators often find protecting unstructured backup to be their biggest data protection challenge.

The Unstructured Data Challenge

The first challenge is the sheer size of the backup job. The total unstructured data set in many organizations is in double and even triple digit terabytes. Secondly, problem number one is getting worse every day because of the rapid rate of growth. Third, these files tend to be small in size compared to database files. Databases can be hundreds of gigabytes in size where most files are less than 1MB. Meaning that unstructured data can represent millions, if not billions of individual files. Finally, this data comes in a variety of formats, not just office productivity files. Unstructured data includes images, video and audio, none of which optimize well.

These four problems lead to a backup nightmare. The sheer size of unstructured data consumes backup capacity. And its exponential growth rate means that backup resource consumption will only get worse. The number of files makes it harder for backup software to identify which files to protect since the entire file system has to be examined for files that have changed since the last backup. The number of files also increases the size of the backup index and the variety of formats means that optimizing the data stored becomes more challenging.

Beyond data protection, there is also a need to keep this information for a long time. Retention means storing a readable copy of the information in the most cost effective way possible.

Move To A “Built-in” Data Protection Model

How can IT planners automate the protection of unstructured data so they can focus on the more mission critical structured data sets in their environment? In order to accomplish this, they need to move to a built-in data protection model. In this model, the same infrastructure that stores the unstructured data will also be responsible for protection and retention. Solving this problem requires a capable front end, combined with a means to protect and archive. A complete solution like this removes the burden from the backup administrator to manage the day-to-day protection of unstructured data.

The Active Archive Solution

Forward thinking IT planners first tried to solve this problem with active archive solutions. These solutions integrate disk storage and a tape library so that they appear as a virtualized network mount point. Data written to the virtualized network mount point is also copied to tape. The archive solution removes data from the disk area as it ages, keeping disk growth in check.

The problem with most active archive type of solutions is that the disk front end only provides basic functionality. For example, most will provide NFS and CIFS access but few provide full active directory integration. Without active directory integration users will have to be manually added to the share, a time consuming and error prone process, especially in large organizations. They also typically don’t include NAS features like snapshots or replication. Finally, they also expect that a tape library will be the final back end storage device. For a great many businesses, the storage capacity of a large tape library may be overkill.

The Cloud Solution

A second option is to leverage the cloud as a repository for end-user data. In fact, a large number of businesses have already moved to placing all their user data into the cloud. This decision, however, has created new challenges for some organizations, not the least of which is concerns over data privacy. In addition, most online file sharing solutions focus only on storing office productivity files. Add other forms of unstructured data to these solutions and they may exhibit scaling problems. Finally, there is a problem with data distribution. Most of these solutions expect each user to store all the data on their laptops or to access that data from the cloud. Most laptops don’t have the capacity to store the entire organization’s data store, nor would an organization want that.

Cloud Integrated vs. Cloud As A Service

Using cloud storage is the right idea but the typical file, sync and share model won’t work nor will those backup solutions that have recently added a “cloud extension”. The problem that most of these solutions have is that there are too many moving parts. One of the reasons that Dropbox became so successful and became a thorn in IT’s side is that it was turnkey and more service like. The enterprise can benefit from a similar product but only one with more enterprise class capabilities and security.

For example a cloud-only access model like Dropbox would be ideal but these lack the security and the complete feature set of a full featured NAS. The problem is that cloud only access is too slow for most users. A viable alternative is to leverage a solution like The Nasuni Service. These solutions extend the cloud into an organization’s data center by caching only the most active data but they provide a complete service instead of the more common piece meal approach that requires you to bring your own cloud, bring your own hybrid appliance and bring your own software. Most large businesses and enterprises simply don’t have the time to put this erector set together. Many need the turnkey approach like Nasuni has with their service offerings.

Cloud as a Service storage works like an active archive. The local file stores all new or changed data. That data is then copied to the cloud instead of tape, providing automatic protection and retention. Then, as data on the local filer cache ages, it is migrated into the cloud for cost efficient off-site retention in the cloud. This process enables the local filer storage capacity to remain compact. Small filer capacity means that high performance SSD storage can be leveraged to provide even better response times.

Using the cloud for the final storage area allows businesses of all sizes to take advantage of better unstructured data management. The investment in cloud storage capacity is incremental, meaning businesses only have to pay for what they use. By leveraging the cloud as their back end storage, developers like Nasuni can focus their efforts on developing intelligent file software technology. They are able to create robust filer capabilities that include full AD integration and snapshots. Finally, these solutions can even replace the demand for file, sync and share by providing a global file system with global locking capabilities. Global file locking is the ability to make sure that when a user accesses data that it shows up as in use or locked to the rest of the organization regardless of location. If another user access the same file, they will only be able to open a read-only version of the file.


By leveraging a cloud-based solution, the unstructured data struggle can be almost eliminated entirely because backups and disaster recovery data copies occur in near real time, without administrator intervention. What’s more, data is completely accessible by users and applications. The on-premise cache delivers the most active data instantaneously and older data is still accessible, albeit with some nominal Internet latency. With 80% of their backup problem removed, administrators may now spend 100% of the time ensuring the protection of the businesses mission critical data.

Sponsored by Nasuni

Click Here To Sign Up For Our Newsletter

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , ,
Posted in Article

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,245 other followers

Blog Stats
%d bloggers like this: