Unstructured data is a growing problem for enterprise organizations. An increasing number of these companies have file counts in the billions and storage capacities over 1PB. These organizations are looking to the cloud to help mitigate some of their unstructured data management challenges.
The problem is that most cloud solutions create a separate silo of unstructured data storage. IT needs to manage this separate silo independently using an entirely different storage software stack than what the organization might be using on-premises. As a result, the organization ends up running one file system in the cloud and another file system on-premises.
Developing an Unstructured Data Strategy for the Cloud
When designing an unstructured data strategy that leverages the cloud, IT needs to make sure that the solution can take advantage of all the cloud use cases. Organizations can use the cloud to store unstructured data and use cloud storage as an archive tier. They can also leverage cloud compute and instantly scale up thousands of processors to analyze a particular unstructured data set. Organizations may also want to use the cloud as a means to provide distributed access to a common data set.
On-Premises Lives On!
At the same time, many organizations will more than likely continue to count on their on-premises processing and storage capabilities to do the day-to-day processing and storing of unstructured data sets. The reality is that both on-premises and cloud storage have unique advantages when it comes to processing and storing unstructured data. Each organization’s strategy should make sure to take advantage of the unique capabilities of both.
Same File System Software On-Premises and in the Cloud
Ideally, organizations should look for solutions that enable them to run file system software both on-premises and in the cloud. Using the same software means that the two locations can communicate with each other. It also means that IT doesn’t have to learn two different ways to manage the same data set. Applications can also now move seamlessly between on-premises and the cloud, without modification.
Using the same file system software, both on-premises and in the cloud, means the two instances can work together. For example, organizations can use the storage solution’s replication software to replicate data to the cloud instantiation of the file system. It also means that the on-premises file system can leverage the cloud file system as an archive. To enable archiving, the file system software needs to allow the organization to quickly identify inactive data and move that data to a cloud archive storage tier. From the perspective of the file system, archiving is now a matter of moving data from one file system to another.
Each instance of the file system software can take advantage of the unique capabilities available to it, on-premises or in the various supported cloud providers. Each location, for example, may use a different scaling model. On-premises, because CPU resources are more static, the software may scale by using nodes that have preset CPU and storage capacities. In the cloud, because CPU resources are available by the minute, the software may allow the temporary, massive scaling of processing so that IO-intensive jobs can complete faster. Once the job is complete, the file system software can automatically “return” unneeded processing power.
Solving the Remote Employee Problem
Having the same software in the cloud as on-premise also enables organizations to provide distributed and secure access to data. For example, if a business wants to allow a new employee to work on data from a remote location, it can move the needed data to the cloud version of the file system. Then it can create a workstation instance in the cloud and allow the employee to use cloud processing to work on the data set. The employee doesn’t need to download data or have a high-powered workstation in their remote office. The cloud does all the processing and data communication within itself. The device used to access the cloud workstation is essentially a terminal. After the remote worker completes the task, the organization can access the data in the cloud NAS instance as well, or move it back down to its on-premises instance.
The key to a successful cloud strategy for unstructured data requires running the same NAS software there as well as on-premises so that data can move between the two seamlessly. With this capability, IT is empowered to leverage the best capabilities of both on-premises and the cloud.
In the below Lightboard Video, Joel Groen, Director of Cloud Go-to-Market for Qumulo, joined Storage Switzerland’s Founder and Lead Analyst, George Crump, to discuss cloud file storage requirements. Legacy architectures aren’t suited to provide a seamless “lift-and-shift” of workloads to the cloud, nor can they establish workload portability between on-premises and the cloud. Groen details how Qumulo has invested in capabilities including multi-protocol support, deep integration with Active Directory, and a decoupled, scale-out approach, to address these needs.