Unstructured data is a big problem for IT professionals. They have to wrestle with serving, storing, protecting and retaining all the information within that unstructured data set. Legacy NAS solutions simply can’t keep up. Cloud storage seems like an ideal solution except when it comes to the first requirement, serving data because performance and latency matter.
The Cloud NAS Problem
Using cloud storage as a solution to an organization’s unstructured data problems became more realistic when vendors started to deliver Cloud NAS solutions. These solutions consisted of an on-premises appliance that cached data to the cloud. Because of cloud latency however, it was necessary to size the appliance with sufficient capacity that cache misses were extremely rare. Unless one sizes the cache to store 100% of the data, there will be chances for misses. As a result, only applications that could tolerate the time to retrieve data from the cloud could leverage the solution.
The result is that organizations can utilize Cloud NAS only in very limited use cases, the most common being replacing file-servers for user home directories. The problem remains that while user files are certainly important, home directories are only a part of the problem. Unstructured data is growing in so many other ways, organizations needs a solution for all of their unstructured data.
The Unstructured Data Solution
A solution to the unstructured data problem that organizations face is to leverage an edge computing model with local cloud or points of presence, reducing latency to the point that applications won’t abort while waiting for a cloud transfer or users won’t throw their hands up in disgust. Essentially data is cached on-premises as in the Cloud NAS model but with a much smaller capacity since the penalty for a cache miss isn’t as severe because the secondary repository is much closer.
The challenge with this approach is the provider is indeed closer and in the event of a regional disaster it could in theory impact the provider. The regional cloud provider should replicate data to another location or to the public cloud to protect against disaster. The data replicated in the cloud should also be accessible to cloud compute so the organization can leverage the secondary cloud for disaster recovery or for running compute-intensive analytics.
The result is a hybrid solution. The most active data is cached on-premises, secondary or near-active data is stored at the edge and the public cloud is used for disaster recovery, long-term retention and for large compute processing.
Unstructured data is a significant challenge facing IT and the problem is only going to get worse. Offloading unstructured data to the cloud is a logical solution to the problem but only if the solution can overcome the latency problem. The hybrid cloud solution, by leveraging the edge, solves many of an organization’s unstructured data challenges.
Storage Switzerland has two resources available that go into more detail about using the cloud to solve unstructured data challenges. First is our on demand webinar “NAS Refresh? – Five Reasons to Consider the Cloud” and second is our white paper “Understanding Cloud NAS Architectures” which is attached to the webinar.
Register now for the on demand webinar and get immediate access to the white paper.