Cloud storage is typically known for its seemingly infinite capacity and low upfront costs. It also is operationally more economical since the organization does not need to house all the gear in its data center. The outsourcing of the hardware eliminates the need to manage upgrades and to migrate to new storage platforms. The enterprise though typically views cloud storage as a place to archive old data. Production cloud storage is often limited to cloud-native applications. The primary reason for these restricted use cases is not a limitation caused by cloud storage but the inherent latency between the data center and cloud storage.
How Data Centers Interface with the Cloud
For the data center to take full advantage of cloud storage, it needs to “exchange” information with the cloud not “send” information to it. Sending information is fine for the traditional use cases of cloud storage, like archive or migrating data to cloud-native applications. The “sending” method only has to deal with cloud latency as it uploads data to the cloud and typically does so by using data efficiency technologies like deduplication and compression as well as WAN optimizations like IP packet shaping. Where these approaches fall short is when the primary data center needs data back from the cloud. Most cloud providers charge an egress fee to pull data from their clouds and of course, there is a performance concern in transporting all this data across the Internet. Use cases like a backup (because of recovery) or using cloud storage to support data center-based applications require an on-premises appliance to act as a local cache in some way.
The Data Center, The Edge, and the Cloud
There are three endpoints that organizations should integrate in some way; the data center, the edge, and the cloud. The data center, or data centers, is of course where most of the organization’s applications run and where, in many cases, the bulk of their data resides.
The edge can vary depending on use case. It can be an IoT sensor on a car that sends information back to the primary data center or it can be a remote office with a handful of users. The edge though is increasingly important, endpoints continue to come online, and the data generated at the edge needs protection.
First Generation Cloud Approaches
First-generation cloud connectivity uses an on-premises cloud cache but really these caches act more like actual storage tiers in that they hold a unique copy of data much longer than a typical cache would. A typical cache works by storing the most active data on higher performance and lower latency devices than the storage it is front-ending. Flash, for example, is an excellent cache for hard disk storage. The key though is the cache should only hold unique data for a few seconds which is not the approach of first-generation cloud solutions.
The cloud, because of distance, creates a new caching challenge. The impact of a cache miss can negatively impact application performance, to the point of crashing the application, ruining the user experience or making recoveries useless. The latency problem, as a result, either severely limits the use cases for cloud storage in support of on-premises applications or forces organizations to size the appliance to the point where there is never a cache miss. The limitation of use cases typically means only using the cloud appliance for unstructured data. One example of this is using the cache for storing user data, sizing it big enough to rarely have a cache miss and only moving very infrequently accessed data, older than a couple of years, to the cloud.
For the first-generation cloud solutions to move beyond caching basic user data and to support production applications, the sizing the cache needs to be the exact same size as the size of production data. Sizing the appliance so big that there is never a cache miss eliminates the very reason for purchasing the appliance, to leverage the cloud for data center storage. If the cloud appliance is so large that it never has a cache miss, the customer is essentially buying twice as much storage as they were before cloud storage even became an option.
Another potential workaround is to migrate the entire application to the cloud and make it a cloud-native application. The problem with the migrating the application to the cloud is that the organization needs first to do extensive development work to change the application to be cloud-ready, often entirely re-writing it for the cloud.
Next-Generation Self-Protecting Storage as a Service
The primary stumbling block to using cloud storage for data center storage is the latency induced by the distance between the organization’s data center and the provider’s data center. A potential solution is to continue to use the on-premises caching approach, but have it cache to a location closer than the typical public cloud provider can offer. Next-generation cloud storage solutions create a middle tier between the primary data center and the large cloud provider. There are plenty of data center-class facilities available, typically called co-location facilities. Equinix, for example, has 52 co-location facilities around the world, which means the vast majority of businesses will have low latency access to one of them.
Next-generation solutions need to provide software that brings together the data center, the middle tier service provider and the public cloud provider into a single, holistic storage network. Typically, the middle tier deployment model involves using a small, all-flash cache on-premises, just large enough to hold data accessed daily, then store older data at a relatively close middle tier data center, and replicate data from there to a public cloud provider. When an application or user adds new data or changes data on the storage cache, it acknowledges it locally, keeping performance high. The solution writes through all data to both the middle tier and the public cloud provider. The solution also automatically optimizes data across hot, warm and cold tiers, where it can then subsequently write all data to the public cloud provider’s storage.
When an application or user reads data, it reads it from the local cache. If the requested data is not in the local cache, the solution retrieves it from the middle tier, at the edge, before retrieving it from the public cloud. The middle tier only adds a few milliseconds of latency which means applications are not impacted and enables IT to keep the on-premises cache small and performance very high.
Establishing an “As-a-Service” Model
Another challenge with the first-generation approaches to cloud storage is how the organization purchases them. Cloud Storage, like other cloud services, is designed for use “as a service”. The first-generation approaches require that the customer purchase the hardware and software required for the on-premises cache upfront. The hardware required by first-generation caches is substantial since to limit the impact of latency, the cache is sized almost as big as primary storage. The size of these on-premises caches also makes it difficult for the vendor to provide the solution “as a service” since their upfront costs are also so high. The result is first-generation solutions are not bought as a service, they are bought as products that the customer now owns and must maintain.
Vendors that offer next-generation solutions with a middle tier are in a much better position to provide the solution as a service. They, thanks to the rapid response of the middle tier, don’t need as large of an on-premises cache. Additionally, more of the “value add” for the next-generation vendor is in their software, since it is sophisticated enough to span on-premises, the middle tier, and the public cloud.
The Use Cases for Next-Generation Cloud Storage
Next-generation cloud storage solutions hide cloud latency from applications that use them. They present to the application what appears to be a standard block device or network share. Essentially any application that needs a block device or network share can use a next-generation cloud storage solution without modifying the application or workflow. There are however a few ideal starting use cases.
1. Next-Generation Backup
An excellent way to start using next-generation cloud storage is as a backup solution. Organizations can use their existing backup solution and direct those backups to the on-premises cache, presenting it as either as a block device or a network share. The cloud storage solution then automatically copies that data to the middle tier and then eventually to the public cloud. Since most recoveries are from the most recent backup, which is in the cache, the organization more than likely has all the data it needs to service most recovery requests. For the large majority of restores, no data needs to transfer from the public cloud.
Using next-generation cloud storage for backups also positions the organization to easily migrate production data to the cloud storage solution and enables them to use it for all of their production storage. The customer simply restores from backup to the volume on the same cache. Conversion is almost instantaneous.
2. Cloud Storage as Primary Storage
The key concerns over using cloud storage as primary storage are latency, performance, costs, and migration to the service. The middle tier resolves the latency issue. The on-premises cache again thanks to the middle tier, is small enough that the vendor can afford to offer it as a flash-only cache and as a service. Given that most of the application IO is from that cache, then performance to applications is excellent and because the capacity requirement is small, costs are reduced. Finally, assuming that the cloud storage solution’s software presents a holistic view across all tiers and presents these views as either block or network shares, then the application can leverage the full solution with no modifications to the existing applications.
3. Self-Protecting Storage as a Service
Using the cloud as primary storage still leverages all the capabilities of using the solution for backup. Primary data that exists on the cloud storage solution is still automatically copied to the middle tier and the public cloud. It is also snapshotted to provide point-in-time protection. Those snapshots can be set to read-only for protection from ransomware. Once in the cloud, IT can have the solution automatically replicate copies to other cloud regions, resulting in self-protecting storage as a service.
IT can place cloud storage caches in multiple locations throughout the organization. It makes connecting and establishing a remote office much easier and more cost effective. If that office needs to move or is consolidated no data has to be moved. A new cache is placed in a new office and the organization can pick up right where it left off.
4. Disaster Recovery as a Service
Once the enterprise stores data in the cloud, the next step is to leverage the other cloud resource, compute, to create a disaster recovery strategy. The organization can leverage the same storage software, instantiated in the cloud, so that storage in the cloud still appears as if it were traditional block or network share mount points. In the event of a disaster, the organization can point cloud compute at the data and resume operations again without needing to modify applications.
Using the cloud for disaster recovery eliminates the need for the organization to create and maintain a separate DR site. It also means that the organization doesn’t need to pay for compute and storage in that DR site, all of which sits idle waiting for a disaster. DRaaS allows the organization to pay for disaster recovery only when there is an outage.
A Logical, Easy Cloud Journey
Next-generation, hybrid cloud storage solutions, because they leverage the edge and because their software presents storage interfaces that IT is already familiar with, make cloud on-boarding easy. An organization looking to move to the cloud can adopt the solution in small steps, gaining confidence both in the cloud and the solution along the way. They can start with just using the solution to improve their backup process, and then leverage the backup process to migrate applications to cloud storage. Once their application’s data is running on next-generation ud storage, they can leverage the capabilities of the solution to create self-protecting storage but still keep the actual applications on-premises.
Most importantly, with next-generation cloud storage solutions, the entire storage architecture is consumed as a service. If for some reason it doesn’t work out for the organization they can stop using it without risking any large and sunk CapEx investment. If the solutions do work, starting slowly doesn’t penalize the organization, it simply builds and equips the next step and expansion.
An example of a next-generation solution is ClearSky Data. ClearSky addresses the cloud latency problem head-on by leveraging the edge for optimal performance and providing software to automatically distribute data across tiers. In fact, ClearSky recently announced a partnership with Equinix to expand the number of edge locations available to ClearSky customers.
ClearSky is consumed as a service. The company places an all-flash cache in the customer’s data center which provides access to the ClearSky network. The ClearSky software manages that cache and as data is added to it, writes the data through to the points of presence at the edge and to public cloud. The software makes sure that the that the most active data is kept close, and any IO requests made for data not in the cache are served by the point of presence at the edge, which is only milliseconds away in terms of latency. All data is automatically stored offsite in the public cloud.
ClearSky customers gain protection for all their data without replication or additional backup and DR software of any kind. Because it’s ClearSky’s network, the customer does not have to worry about connectivity from the data center to the edge and to the cloud – it’s seamless to the end user. Customers can follow the path described above to make their cloud strategy a reality without migration or re-writing of apps. They could start by using the solution as a backup enhancement, then leverage those backups for migrating data to the cloud and eventually use the service to truly use cloud storage as primary storage, creating a self-protecting storage as a service environment.
For the cloud to be truly useful it has to provide value to core data center applications and the use cases have to be more than storing backup copies. Cloud as primary storage is a reality now thanks to the concept of the edge which reduces latency and improves performance. A logical way to start, though, is to use the solution to improve the data protection process and to reduce backup storage costs. ClearSky gives customers a path to full-service utilization and lets them journey down that path at the pace with which they are comfortable.
Sponsored by ClearSky Data
Watch our on-demand webinar with Storage Switzerland and ClearSky to learn why current cloud solutions aren't doing enough to eliminate backups or simplify DR and how a complete, hybrid cloud storage service can meet these goals without compromising performance.