Defeating Data Gravity with a Data Pipeline

Posted on July 3, 2019 by George Crump

As organizations move into the cloud era, they have to deal with data gravity. Data gravity means that data has a size to it and it takes time to move data from point A to point B. Data’s gravity is not only a problem for organizations looking to move to a single cloud but for organizations looking to move to multiple clouds.

Cloud Egress Fees Monetize Data Gravity

Cloud egress fees are fees associated with the movement of data from one cloud provider to another or back to on-premises storage. With only a few exceptions all major cloud providers levy an egress fee on organizations looking to move data from one cloud to another. These cloud providers have monetized data gravity. The larger a data set, the heavier its weight, the more expensive it is to move that data to another provider.

Egress fees charges on an organization’s bill are very hard to understand. It makes cloud invoices reminiscent of the early cellular phone bill, where no one really understood what they are being charged. In addition to making for a complex accounting situation, egress fees also make it expensive to flexibly leverage multiple clouds for different reasons.

Egress Fees Threaten a Multi-Cloud Future

Cloud providers all differ in capabilities and services offered. Organizations want to move workloads between providers to leverage unique capabilities and to take advantage of special pricing. Egress fees, in addition to time to move data, are a significant stumbling block toward that goal. Many organizations find it is easier and more cost effective to keep data in all clouds at all times. The challenge is how to make sure data is up to date in each cloud provider all the time.

Using a Data Pipeline to Deflate Data Gravity

A data pipeline is more than a backup or migration job. It is a continuous flow of information stored in a native format that is immediately accessible by the providers’ computing infrastructure. Unlike a backup job, information is captured in realtime and sent to secondary repositories. In the cloud era, organizations can simultaneously update an Amazon AWS, Google Compute Cloud and Microsoft Azure instance in near realtime.

With the same copy of data in each location at the same time, organizations are free to operate on this data using the best tools available to them from each provider. Then once operations are complete, it is only necessary to send the results back to the primary data center, if at all. A data pipeline can dramatically reduce egress fees as well as eliminate waiting for a particular cloud to have the right data set.

At its Data Driven conference, Actifio explained how their Virtual Data Platform solution creates a virtual data pipeline between on-premises and cloud storage. Users can start by using the data pipeline for data protection and disaster recovery and then leverage those same data instances for analytics, business intelligence, artificial intelligence and machine learning.

StorageSwiss Take

Many organizations look at their cloud strategy as something that is separate from other process within IT. Instead, organizations should look at integrating data protection, data management and a cloud strategy into a single best practice. A data pipeline is the necessary foundation to that practice because it makes sure that the latest copy of data is available everywhere the organization needs to be without having to wrestle with data gravity.

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Actifio, Analytics, Backup, Cloud, Copy Data, DataDriven2019, Disaster recovery, Egress Fees, Metadata, Migration, SMB
Posted in Briefing Note