As organizations move into the cloud era, they have to deal with data gravity. Data gravity means that data has a size to it and it takes time to move data from point A to point B. Data’s gravity is not only a problem for organizations looking to move to a single cloud but for organizations looking to move to multiple clouds.
Cloud Egress Fees Monetize Data Gravity
Cloud egress fees are fees associated with the movement of data from one cloud provider to another or back to on-premises storage. With only a few exceptions all major cloud providers levy an egress fee on organizations looking to move data from one cloud to another. These cloud providers have monetized data gravity. The larger a data set, the heavier its weight, the more expensive it is to move that data to another provider.
Egress fees charges on an organization’s bill are very hard to understand. It makes cloud invoices reminiscent of the early cellular phone bill, where no one really understood what they are being charged. In addition to making for a complex accounting situation, egress fees also make it expensive to flexibly leverage multiple clouds for different reasons.
Egress Fees Threaten a Multi-Cloud Future
Cloud providers all differ in capabilities and services offered. Organizations want to move workloads between providers to leverage unique capabilities and to take advantage of special pricing. Egress fees, in addition to time to move data, are a significant stumbling block toward that goal. Many organizations find it is easier and more cost effective to keep data in all clouds at all times. The challenge is how to make sure data is up to date in each cloud provider all the time.
Using a Data Pipeline to Deflate Data Gravity
A data pipeline is more than a backup or migration job. It is a continuous flow of information stored in a native format that is immediately accessible by the providers’ computing infrastructure. Unlike a backup job, information is captured in realtime and sent to secondary repositories. In the cloud era, organizations can simultaneously update an Amazon AWS, Google Compute Cloud and Microsoft Azure instance in near realtime.
With the same copy of data in each location at the same time, organizations are free to operate on this data using the best tools available to them from each provider. Then once operations are complete, it is only necessary to send the results back to the primary data center, if at all. A data pipeline can dramatically reduce egress fees as well as eliminate waiting for a particular cloud to have the right data set.
At its Data Driven conference, Actifio explained how their Virtual Data Platform solution creates a virtual data pipeline between on-premises and cloud storage. Users can start by using the data pipeline for data protection and disaster recovery and then leverage those same data instances for analytics, business intelligence, artificial intelligence and machine learning.
StorageSwiss Take
Many organizations look at their cloud strategy as something that is separate from other process within IT. Instead, organizations should look at integrating data protection, data management and a cloud strategy into a single best practice. A data pipeline is the necessary foundation to that practice because it makes sure that the latest copy of data is available everywhere the organization needs to be without having to wrestle with data gravity.