Where previously unstructured data comprised a minority of the business’s data and was not strategic to the organization, that scale has tipped dramatically. Unstructured data may account for two-thirds or more of the data that businesses are collecting, storing, and through analytics and machine learning, uncovering insights that are moving the needle when it comes to competitive advantage. The desire for the insights is also increasing unstructured data retention times.
The robust growth of unstructured data is contributing to complexity when it comes to serving file data. File types vary in size and type, they are distributed globally across the organization’s locations, and must be accessed by a user base that is equally as dispersed. Previously, Storage Switzerland discussed the collaboration and performance challenges that are inherent with trying to serve this environment with legacy network-attached storage (NAS) arrays. In this installment, we will discuss how to protect this data – which is no easy feat, especially in the face of strict data privacy laws.
Protection is No Longer a Nice-to-Have
In the past, many organizations left their unstructured file data unprotected due to costs and complexities. The scale of a required data protection implementation often outpaces budgets and, from a management perspective, IT staff resources.
In today’s world, however, this is no longer an option. In addition to increasing criticality, data privacy regulations including the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) requires an organization to gain comprehensive visibility into the data they are storing, and how that data is being stored as well as accessed. Furthermore, these regulations are causing consumers to be more mindful of how their data is being stored and used by the business. Business requirements including analytics and DevOps are furthermore creating the need for more data to be stored in a way that will preserve its integrity.
Traditional File System Backup
Backing up unstructured data is time consuming. Also, using a typical image-based backup technology sacrifices file granularity, which makes adhering to regulations like GDPR more difficult. Administering and recovering from backup data such as snapshots is expensive and cumbersome. The snapshot technology that typical backup solutions rely on consumes expensive primary storage capacity. Furthermore, most snapshots rely on metadata for tagging, which greatly taxes the storage system’s Input/Output Operations Per Second (IOPS). In fact, many legacy file systems cap the number of snapshots that can be retained in the interest of preserving system performance. Snapshot vaulting and mirroring may alleviate some of these headaches, but may still impose retention limitations, and require the need to invest in additional capabilities such as replication software and wide-area network (WAN) acceleration – not to mention storage infrastructure resources and software licenses to host snapshots.
A Newer Approach: Continuous File Versioning
Continuous file versioning is emerging as a less expensive and more agile approach to ensuring data availability and recovery. Like a more traditional snapshot technology, continuous file versioning starts off by capturing an image of an entire volume. From there, however, subsequent snapshots capture only changes to the storage volume – thus facilitating more efficient use of storage capacity and network bandwidth (the latter being especially true when images are stored in a cloud service). Storage managers should look for a continuous file versioning technology that employs a data sharding process, which enables a smaller amount of changes to be captured and processed in parallel, to avoid an impact to performance and to increase scalability.
Rapid Disaster Recovery
In today’s around-the-clock business world, server downtime and loss of data access can wreak havoc on the company’s operations and credibility. The impact can be even more devastating when multiple departments and locations are relying on a particular location for key information. When it comes to ensuring speedy, secure and high-performance access to file data, infrastructure planners and storage managers should consider investing in a solution that preserves multi-site access to files, and that allows the “down” site to be restored as quickly as possible. Users throughout the enterprise should be able to access files related to their workflow regardless of the outage. Likewise, the standard for restoring onsite file access should be minutes, not hours or days. Solutions that are designed to rapidly restore edge resources and rehydrate these resources with frequently accessed files offer strong advantages here.
Our next installment will explore the value of the cloud for facilitating distributed file data availability and protection, as well as scalable and global secure access to this data.