The writing’s on the wall. Eventually almost every organization will use cloud storage in one form or another. While the use cases may range from production applications to backup, the economics may be too appealing for an organization not to use it. Economics does not mean “cheaper,” especially if considering long term data storage. But the operational advantages of the cloud maybe too compelling to pass up.
There are two serious omissions, though, for cloud storage to truly be ready for this expanding role; one is cloud to cloud data movement and the other is dual authenticated deletion. In this column, we’ll look at cloud to cloud data movement. In the next we’ll look at protecting against unauthorized account deletion. In our third column in this three part series, we will take a hard look at the economic aspects of cloud storage.
Cloud to Cloud Data Movement
The ability to move data from one cloud to another, today, is almost nonexistent. An organization can, if it really wants, move to another cloud but that process would mean copying all data from cloud A to on-premises storage, then copying that data to cloud B. Not only is that a time consuming process, depending on how entrenched the organization is in the cloud, it may be an impossible one if the organization has more data in the cloud than it does on-premises.
Even if cloud providers allow direct cloud interconnectivity (which they should and eventually will) and the software solutions all support that connectivity, there is still the issue of data having gravity. Moving 1PB of storage, even with a direct connection, is going to take a tremendous amount of time.
The net result of the current state of affairs is organizations will only want to move data when they absolutely must, not when it makes sense to do so. The short term requirement is the organization has to make 100% sure it really likes the cloud provider, both now and in the future.
The Solution – Data Streaming
The solution to the data gravity problem and the lack of direct cloud interconnect problem is to stream data in much the same way applications stream today. What this means is the application is started in cloud B and data is streamed to it as it needs it. As a background process the data in cloud A can all be sent to cloud B, but it doesn’t have to.
What if, for example, cloud B has an excellent face recognition engine that the organization wants to leverage. But all its video surveillance data is in cloud A. Instead of copying all the data over to cloud B the organization could stream just the data it needs analyzed. Once the answer is found it can shutdown the process in cloud B and remove any working dataset.
Another example is disaster recovery. The organization could make sure that only the most active data is available in both cloud A and cloud B, which would keep the costs of cloud B to a minimum. But if cloud A suffered a major outage, the organization could run the majority of its applications in cloud B. Also in many cases a cloud outage is not a total failure. For example, an application or service may be down but other components, like storage, are still running. If storage is still active then that data could still be streamed.
How to Create Data Streaming
Streaming data to the cloud requires a cloud-based caching mechanism that caches data to the second cloud. The caching software would run in both clouds and would make sure only the most active data is in cloud B (in the case of the DR use case) or that when an application is started in cloud B just the data it needs at that moment is sent from cloud A to cloud B.
An alternative is a storage system that is multi-cloud from the start. These systems will likely be scale-out in nature and support a concept of nodes in multiple locations (or clouds). With this model, data could easily move between on-premises, to cloud A and then to cloud B, based on policy.
Both of these solutions provide the capabilities of multi-cloud but don’t require a second cloud. The original data center, if the organization wants it to, can be cloud B.
If data is going to the cloud then IT needs to make sure that data is available in either multiple clouds or one cloud and on-premises. All the major cloud providers have had well-documented outages, most as the result of human error. Organizations need to assume their cloud provider of choice will have an outage that will interrupt operations and have an alternative site where data can be accessed and application be started.