Multi-Cloud is real, most organizations have multiple cloud relationships and plan on maintaining them. Even AWS CEO Andy Jassy said in a recent Wall Street Journal Interview, “There will not be only one cloud…but rather a handful of large cloud firms with companies taking a multi-provider approach to the cloud”. Storage Switzerland believes that there will not only be a handful of large cloud providers, but there will also be an almost unlimited number of regional, metro cloud providers and private local object stores.
The question is not whether there will be multiple clouds, nor even if organizations will use multiple clouds, but how will an organization move data between these multiple clouds. While the majority of organizations have multiple cloud relationships, they do not have a clear strategy for moving data between their various cloud providers.
The fundamental problem is data has weight, and it simply can’t spin up on-demand like compute resources. Data needs strategic positioning, often in advance, so that it is in the right place at the right time. While there are a few companies claiming multi-cloud data capabilities it is often for a specific solution. A backup appliance that claims it supports multi-cloud, can replicate data to a variety of cloud providers but it is only for its specific data set.
The organization needs a multi-cloud data controller that is more global in its approach. The multi-cloud data controller not only provides a global file system and namespace but also a framework to automatically move and copy data throughout the various clouds with which an organization may inter-operate.
What is a Cloud Data Controller?
A cloud data controller is a solution that can mobilize data from on-premises storage systems to the cloud and from one public cloud to others, and even to multiple locations simultaneously. Some other organizations use the term cloud data fabric to represent a solution that makes it easier for an organization to move data between clouds. These solutions typically lack the ability to present a single global file system that spans all of the organization’s on-premises or cloud resources. Others force a conversion from the current open object storage format to a proprietary file system and do not store data in a format native to the cloud on which the data is residing.
A cloud data controller creates a single global file system that spans both on-premises object stores and cloud storage. The controller should also add the ability incorporate more traditional “legacy” NAS stores over protocols such NFS and SMB, which leads to eventually being able to import these data storage solutions into a single solution. It also stores data in a format native to the cloud in which it is residing so that it is easy to assign computing resources to operate on that data.
The Requirements of a Cloud Data Controller
Requirement # 1 – Single Name Space
Several vendors are developing multi-cloud solutions. Most of these solutions copy data from one cloud to another but require IT to manage each cloud storage pool separately. A cloud data controller creates a global file system enabling centralized management of the entire storage infrastructure no matter where the actual physical storage is located. The single namespace should incorporate on-premises object storage, on-premises NAS storage and cloud storage for each cloud provider. The cloud data controller should enable the organization to perform metadata searches across both public and private clouds.
Requirement # 2 – Cloud Native Storage
In addition to creating a single namespace, the cloud data controller should also store data in a format native to the storage system in which the data resides. That means that if data is stored in Amazon S3 it is stored in native S3 format. If the controller later copies that data to Microsoft Azure, it converts it during transmission, into Microsoft’s Azure Blob Storage native format. Some cloud file system products create the global namespace by creating a new file system protocol and a version of their software must be present in each cloud in order to interact with the data.
Storing data in the native format of the cloud provider is critical to make sure that computing resources in those clouds can actually process the data stored there without the insertion of additional drivers or software.
Requirement # 3 – A Standalone Product
Several vendors are adding extensions to their products to enable some multi-cloud capabilities. These products require the purchase of that vendor’s core storage capability even if all the customer wants is the ability to migrate data to the cloud. Additionally, these products tend to look at the vendor’s core product as the center of the universe and the cloud is a simple extension. As a result, these products don’t often exploit all the capabilities of the cloud provider’s storage.
While cloud data controllers may come to market from companies with existing storage products, the controller itself is a standalone solution and operates independently from any other product the vendor may produce. The independence of the cloud data controller enables it to provide a single interface to store, manage and search data across many private and public clouds.
Requirement # 4 – Automation and Orchestration
Finally, the cloud data controller should provide a universal API set and workflow engine that enables data to flow automatically from on-premises to public cloud and between public clouds, based on business policies. If automation exists in other solutions, it tends to be non-programmable. For example, a solution may be able to move data from cloud A to cloud B when data reaches a certain age but it can’t typically moved data based on current workload conditions. Automation like data movement based on workload conditions, requires the analysis of resources external to the storage. To enable that analysis the controller needs to provide both a scriptable API set as well as a workflow engine.
Why Do Organizations Need a Cloud Data Controller?
The initial benefit of a cloud data controller for organizations is that the solution will give them a single view of all their object storage (both on-premises and cloud) resources. It enables them to move data or continuously update data in other clouds so it is available for processing by the alternate cloud location.
An organization can use the cloud data controller to manage cloud storage resources better, and to simplify data management across multiple different cloud APIs. For example, it can leverage the automation and orchestration features to move data, based on lack of access, to less expensive cloud storage or even move it back on premises to an object storage system it owns instead of one to which it subscribes. Over time, owing a large storage pool is typically less expensive than renting it.
An organization can also use the cloud data controller to distribute data between multiple clouds to take advantage of spot pricing of computing resources. In this use case, it continuously updates data across public cloud providers so that it is strategically in position if a spot pricing advantage appears.
An organization could also use the cloud data controller to provide disaster recovery capabilities for cloud native applications. Once again, it continuously updates data between two or more public cloud providers, if there is a failure in the organization’s primary cloud, then they can start operations in the secondary cloud.
The organization can also use the cloud data controller in edge/IoT situations. In this example, the organization typically has multiple small private object stores residing close to where data collection or generation is occurring. Once that data is stored on the local object store, the cloud storage controller copies it to a centralized store in a public cloud. There the organization can leverage cloud-computing resources to process and analyze all the data from all locations.
Finally, the cloud data controller empowers a multi-site content distribution requirement. Instead of the edge collecting data, now it is receiving it so that regional users experience better performance and lower latency. The cloud data controller enables the organization to pre-populate various cloud locations (public or private) so that the data is available for local processing.
Conclusion
A cloud data controller is vital to hybrid IT, to create coherence across the inevitable use of multi-cloud services within the enterprise. The lack of a cloud data controller forces the organization to either manually move data between clouds, or to give up on the idea of the hybrid data center altogether. Hybrid IT, equipped with a cloud data controller is able to move or copy data between all the various storage locations that the organization might have. The organization can then start applications and workloads in any cloud or location for any reason. The cloud data controller delivers the functionality that Hybrid IT first promised.
About Our Partner
Scality builds the most powerful storage tools to make data easy to protect, search and manage anytime, on any cloud, giving users the freedom and control necessary to be competitive in a data-driven economy. Recognized as a leader in distributed file and object storage by Gartner and IDC, they help businesses meet the challenges of the fourth industrial revolution. In September, 2018, Scality introduced Zenko 1.0, the industry’s first and only cloud-native multi-cloud data controller. Available now in both open source and enterprise editions, Zenko was built from the ground up to address the challenges of modern cloud application development and data management, overcoming the challenges of complexity and lock-in that come with today’s cloud storage services.
Zenko 1.0 open source edition is available for free download at https://www.zenko.io/. Visit Scality.com for more on Zenko enterprise edition.