How to Evolve From Unstructured Data Protection to Unstructured Data Management

Posted on January 15, 2019 by George Crump

A recent report from Igneous, “The State of Unstructured Management 2018,” indicates that most IT professionals are not satisfied with their ability to meet organizational expectations about the backup, recovery, and retention of their unstructured data sets. The value that organizations are placing on their data is also increasing. The same report indicated that the only asset that organizations ranked more important than “data” was “customers.”

The report, based on a survey of 200 IT leaders managing massive amounts of file data, also shows a changing perspective of unstructured data concerns that reflect the increase in the value of the organization’s data. While concerns around protecting data, meeting data capacity demands and securing data still exist, their level of importance in comparison to data accessibility, data governance, and data insight has diminished. Organizations’ top priorities now consist of the ability to discover and find data, meet regulatory compliance, and understand data relevance. Unstructured data protection needs to seamlessly work within a larger unstructured data management strategy, but most organizations do not have the tools to accomplish the more challenging management tasks.

The problem is that unstructured data protection is often managed separately from unstructured data management. Classically, a backup administrator is responsible for unstructured data protection, and an archive administrator is responsible for overall unstructured data management.

Running the backup and archive processes as separate silos is problematic. It requires at least two processes, each of which must interact with the filesystems that store the unstructured data. The separation of the two processes means two sets of software to perform the functions, twice as much filesystem interaction, and most importantly, twice as much storage.

Watch On Demand

The Backup Challenge

A significant concern is that most organizations use their backup process to fulfill at least part — and in many cases all — of their organization’s data retention and data compliance needs. Recovery, which should be the primary reason that organizations invest in backup in the first place, most often occurs from the most recent backup, and the number of recoveries that occur from backups older than a week are less than 1% of all recovery efforts. Despite these recovery patterns, most organizations tend to retain backup data for years, if not decades. The reason? To meet data retention and data compliance concerns.

Another concern with backup is most backups today are image-based, and while the image-based backups enable granular file restoration, the IT administrator has to know which backup job contains the correct version of a needed file. Most solutions cannot search across backup jobs to find all occurrences of a file or group of files. The inability to quickly search across image-based backups means that these backups are ill-suited for meeting compliance needs.

The Archive Challenge

A backup process that only retains a few months or even a few weeks of backup data will better serve an organization’s backup needs than one that retains data indefinitely. The organization can then meet retention requirements by copying all data to an archive. An archive provides a file-by-file view instead of a “job view,” enabling it to easily fulfill requests for a particular file or version of a file. The problem with the archive process is how to get this data to the archive in a reliable fashion. Most archive solutions don’t provide native data movement capabilities like backup software does. They typically function by doing a file-by-file scan of the filesystem looking for candidates to move to the archive, which is a very time-consuming process.

Archives are also typically limited in their storage options, archiving to a large tape system, NAS system, or object store. Many have a surprising lack of support for cloud storage. Archive storage is also typically standalone and separate from the backup store.

Evolving Data Protection and Data Archive to Integrated Unstructured Data Management

Organizations need to look for solutions that enable them to evolve from protecting data to actively managing it. Additionally, instead of creating a separate standalone solution for data protection, suppliers need to integrate the two processes. Although data protection is a foundational component of Unstructured Data Management (UDM), UDM is more than just backup and archive. It also encompasses copy management, data replication, disaster recovery, dataset discovery/analytics, robust search, data privacy compliance, and data movement/migration workflows. In addition, it encompasses the idea that if these processes are integrated with each other, they can all be more efficient.

The Requirements for Unstructured Data Management

The first requirement is a single-step interaction with data sources. Even if the organization is performing just backup and archive, these two processes represent two passes across every storage system in the data center. Expanding to all the capabilities of UDM without integrating them may mean five or six passes across all the storage systems in the environment. Each of these silos also requires individual management and configuration. The lack of an integrated UDM approach is why so many organizations look to backup as their single UDM solution, despite its inability to deliver all the required functions.

The second requirement, in addition to all the additional capabilities of UDM, is a set of robust data protection capabilities. If data protection and management are to be combined, the UDM solution needs to provide continuous data backup. As users or applications add data to, or change files on, servers, the UDM solution must very quickly copy them to a secondary storage area. This rapid copying of data means that the solution inherently protects against risks ranging from natural disasters to ransomware and other cyber-attacks.

The third requirement of UDM is intelligence. Why? For one, intelligent data management places less pressure on the backup component of the solution. Backup essentially becomes a means of data transport, and a system that can predict which data needs to be retained, can optimize the location of the data once it is in the UDM system. This vastly simplifies the job of “backup.” For another, intelligent UDM allows the solution to provide exact detail on the data it stores. Search should be universal and operate across all tiers of storage, be they on-premises or in the cloud. The UDM solution should also provide insight into production data, with advice on what data to move off of primary storage, thus lowering primary storage costs.

Management of the secondary storage tier is another critical aspect of UDM. Organizations are increasing their investment in secondary storage, and now are in danger of being overwhelmed by it. It has to scale to meet the explosive growth of data, the majority of which occurs on the second tier. The costs and operational overhead required to purchase, maintain and upgrade secondary storage as it scales are daunting. Secondary storage scalability has two components: the software has to scale in order to track the details of the billions of files it may manage, and its capacity has to scale in order to physically store all the data. To solve these issues, UDM should offer the customer an option of delivering secondary storage “as-a-service,” and integrate with both on-premises storage and cloud storage.

The Benefits of Unstructured Data Management

In the past, vendors sold archive with the promise, “It can pay for itself by reducing the primary storage footprint.” Archive solutions only reduce the cost of primary storage if IT trusts the solution enough to let it move data from primary storage. These solutions required that IT “jump in with both feet” and start migrating data immediately. Archive vendors build their return on investment (ROI) models on the assumption that IT migrates 50% or more of their data to the archive. Today, IT seldom makes these jumps and ultimately abandons most archive projects.

UDM is different. First, it solves an immediate problem that IT faces — how to adequately protect unstructured data stores on a continuous, consistent basis– by integrating unstructured data backup with other data management functions. The continuous backup capability provides IT the satisfaction of knowing they can recover from almost any type of disaster, including ransomware.

Second, the concept of integrated backup provides great confidence in taking the next step of archiving old data off of primary storage. The integration means that, for instance, organizations can set up a policy that says, “Only remove files from production if they haven’t been accessed in 90 days AND they have been adequately secured by the backup process”. Also, since the UDM solution is already paying dividends on data protection, there is less pressure to get to the 50% migration mark immediately. IT can dip its toes in the archive waters to build confidence in the solution.

Even before archiving however, the UDM solution provides cross-tier search and file insights so that the organization can find data regardless of its storage location. UDM enables the organization to lower not only costs but also meet compliance regulations, including GDPR and other data privacy legislation like California’s Consumer Privacy Act.

In short, UDM solves a big problem facing IT right now: unstructured data protection. It also sets the stage for organizations to meet regulatory and compliance concerns, and lowers overall storage costs.

Watch On Demand

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Archive, Cloud, Copy Data, Data Privacy, Igneous, Migration, NAS, Ransomware, Replication, ROI, UDM, Unstructured data
Posted in Article