Primary storage vendors have been slow to integrate the public cloud into their solutions. Most pretend the public cloud doesn’t exist and even the vendors that do provide some level of integration to use the cloud only as a giant digital graveyard to store inactive data. While using the cloud for backup, archive and disaster recovery is important, IT professionals want more. For example, they want to leverage the available compute resources to generate results and answers faster or to handle peak data loads. The problem is this type of use case is much more expensive from a storage perspective since these workloads will run on a service like Amazon EBS (Elastic Block Storage) instead of S3 (Simple Storage Service).
Leveraging Cloud Compute Step-by-Step
The first step in leveraging cloud compute is getting data to the cloud. For data to be operated on when data gets to the cloud, it has to be in a format the compute resources can access. This means the movement either has to be native or the data has to be transformed when it gets to the cloud.
The second step is keeping the cloud copy of data updated so when a need to analyze it comes or a peak load requirement hits the cloud already has all the data it needs to spring into action.
The second step leads to a third step, deciding which type of storage to place the data. Amazon EBS is designed to deliver high performance and deterministic IO, where Amazon S3 is designed to deliver very cost-effective storage. In most cases, the two services are not used in combination with each other, most customers have to make an either-or choice.
For the most part, applications will need to access their data on the EBS tier to get the performance and data consistency they need. The problem is that EBS is the most expensive tier within the Amazon storage portfolio, so having data sitting there waiting for use is wasteful. It makes more sense to keep the data on the S3 object store, but that means the data is inaccessible to the application running in the cloud or it won’t deliver the desired performance.
Solving Cloud Compute Storage Challenge
Elastifile provides a hardware and cloud-agnostic distributed file storage architecture whose namespace can span multiple sites, including multiple clouds. Recently, Elastifile added a capability it calls CloudConnect. CloudConnect replicates specific datasets to a cost-effective cloud-based object store like Amazon S3 and then continuously updates that store as changes are made on-premises. If the organization decides it wants to leverage cloud compute for some reason, CloudConnect can move that dataset to EBS for processing in the cloud via the Elastifile Cloud File System (ECFS). Then, when the answer is derived or the peak load has passed, CloudConnect can move the data back to S3 and update the on-premises file system, if needed.
The result is the organization can fully leverage the cloud when it needs to and take advantage of the spot pricing offers that come available.
There are many facets of an organization’s cloud journey. It typically starts off as using the cloud for backup, then transitions to disaster recovery and maybe archive. Eventually, the organization wants to leverage cloud compute, so their data center can be built for the “norm” and they can leverage the cloud to handle peak demands on infrastructure. The challenge has been to make the cloud ready to take over processing meant the pre-seeding of data to an EBS like tier, which gets expensive. Elastifile’s ability to use S3 as the long-term store and then move the data to EBS when needed is ideal to get around this problem.