Creating a Cloud Strategy for Unstructured Data

Posted on September 26, 2019 by George Crump

Unstructured data is a growing problem for enterprise organizations. An increasing number of these companies have file counts in the billions and storage capacities over 1PB. These organizations are looking to the cloud to help mitigate some of their unstructured data management challenges.

The problem is that most cloud solutions create a separate silo of unstructured data storage. IT needs to manage this separate silo independently using an entirely different storage software stack than what the organization might be using on-premises. As a result, the organization ends up running one file system in the cloud and another file system on-premises.

Developing an Unstructured Data Strategy for the Cloud

When designing an unstructured data strategy that leverages the cloud, IT needs to make sure that the solution can take advantage of all the cloud use cases. Organizations can use the cloud to store unstructured data and use cloud storage as an archive tier. They can also leverage cloud compute and instantly scale up thousands of processors to analyze a particular unstructured data set. Organizations may also want to use the cloud as a means to provide distributed access to a common data set.

On-Premises Lives On!

At the same time, many organizations will more than likely continue to count on their on-premises processing and storage capabilities to do the day-to-day processing and storing of unstructured data sets. The reality is that both on-premises and cloud storage have unique advantages when it comes to processing and storing unstructured data. Each organization’s strategy should make sure to take advantage of the unique capabilities of both.

Same File System Software On-Premises and in the Cloud

Ideally, organizations should look for solutions that enable them to run file system software both on-premises and in the cloud. Using the same software means that the two locations can communicate with each other. It also means that IT doesn’t have to learn two different ways to manage the same data set. Applications can also now move seamlessly between on-premises and the cloud, without modification.

Using the same file system software, both on-premises and in the cloud, means the two instances can work together. For example, organizations can use the storage solution’s replication software to replicate data to the cloud instantiation of the file system. It also means that the on-premises file system can leverage the cloud file system as an archive. To enable archiving, the file system software needs to allow the organization to quickly identify inactive data and move that data to a cloud archive storage tier. From the perspective of the file system, archiving is now a matter of moving data from one file system to another.

Each instance of the file system software can take advantage of the unique capabilities available to it, on-premises or in the various supported cloud providers. Each location, for example, may use a different scaling model. On-premises, because CPU resources are more static, the software may scale by using nodes that have preset CPU and storage capacities. In the cloud, because CPU resources are available by the minute, the software may allow the temporary, massive scaling of processing so that IO-intensive jobs can complete faster. Once the job is complete, the file system software can automatically “return” unneeded processing power.

Solving the Remote Employee Problem

Having the same software in the cloud as on-premise also enables organizations to provide distributed and secure access to data. For example, if a business wants to allow a new employee to work on data from a remote location, it can move the needed data to the cloud version of the file system. Then it can create a workstation instance in the cloud and allow the employee to use cloud processing to work on the data set. The employee doesn’t need to download data or have a high-powered workstation in their remote office. The cloud does all the processing and data communication within itself. The device used to access the cloud workstation is essentially a terminal. After the remote worker completes the task, the organization can access the data in the cloud NAS instance as well, or move it back down to its on-premises instance.

StorageSwiss Take

The key to a successful cloud strategy for unstructured data requires running the same NAS software there as well as on-premises so that data can move between the two seamlessly. With this capability, IT is empowered to leverage the best capabilities of both on-premises and the cloud.

In the below Lightboard Video, Joel Groen, Director of Cloud Go-to-Market for Qumulo, joined Storage Switzerland’s Founder and Lead Analyst, George Crump, to discuss cloud file storage requirements. Legacy architectures aren’t suited to provide a seamless “lift-and-shift” of workloads to the cloud, nor can they establish workload portability between on-premises and the cloud. Groen details how Qumulo has invested in capabilities including multi-protocol support, deep integration with Active Directory, and a decoupled, scale-out approach, to address these needs.

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Cloud, Data Sharing, Hybrid, IoT, M&E, Metadata, Migration, NAS, NFS, Qumulo, Retention, SMB
Posted in Blog