When organizations attempt to create a hybrid cloud storage architecture to manage costs better and ease on-premises storage requirements they face a significant hurdle, dealing with the latency between on-premises storage and cloud-based storage. The transfer of files between these locations, while an issue, is not the primary concern; it is the access of the metadata. Without proper metadata management, a simple directory listing that spans cloud-based data can take longer than actually transferring a file. Cloud vendors have offered a variety of solutions to overcome the latency hurdle but most fall short and lead to lower cloud adoption rates.
Hybrid Cloud Storage Workarounds
The first and potentially most apparent solution to creating a hybrid cloud infrastructure is to give up and move 100% of the data to the cloud. While each of the major public cloud providers has at least one file system offering, most organizations are uncomfortable moving all their data to cloud storage. Also, most organizations have significant computing capabilities already on-premises and want the response times that local storage systems provide.
A final challenge is that cloud storage is rented, never owned. Organizations that have more than five petabytes of capacity that needs long-term retention, say for more than 5 years, often find it is more cost-effective to own the storage they are storing that data on instead of renting it. Object storage (private cloud storage) systems provide organizations with public cloud storage experience but with the ability to own the capacity.
The second option is a cloud gateway, which stores all data in the cloud and caches active data in an on-premises appliance. The on-premises cache typically works like any other cache. The most active data is stored both in the cloud and on-premises. The challenge is that a cache-miss means retrieval from the cloud, so most organizations size the cache much larger than the typical cache size best practice of 5%. The large on-premises cache means a much larger cost.
Metadata again is a crucial problem with cloud gateways: while the solution stores metadata for the cached data on the on-premises cache, in most cases the metadata for data stored in the cloud is also stored in the cloud. Any metadata operation that spans both on-premises data and public cloud data experiences a significant performance impact.
A final challenge is that many gateway solutions are merely that, a connection between on-premises storage and public cloud storage. The organization needs an additional component to analyze and move data between on-premises and the cloud, preferably automatically.
Some solutions do provide this transparent data movement by creating a global file system that spans the on-premises storage and public cloud storage. The challenge is that all data must move into this global file system to be managed, which means disruption for IT and the users. Another challenge with global file systems is they are not typically cloud-native. Meaning that public cloud computing resources can’t natively access the data that they store. An additional software component often needs to be added to any provisioned computing resources, which further complicates cloud interoperability. Finally, the global file systems are still subject to the same metadata challenges depending on how and where they store their metadata.
A final option is to give up on the public cloud and use an on-premises object storage system. Most cloud storage is object-based, so this gives the organization their own private cloud storage capability. Object storage is undoubtedly viable but is an expensive upfront investment, and some organizations may not be able to justify the initial purchase until their data stores grow. It might make more sense for them to initially move old data to the cloud until their data stores reach a size that can immediately justify the on-premises object storage system.
Beyond just the latency caused by distance, public cloud, as well as on-premises object storage systems, are also slow at handling metadata operations, at least in comparison to high-performance network-attached storage (NAS) systems. Public cloud object storage delivers a double-latency penalty, but an on-premises object storage system also certainly has latency issues.
The on-premises object storage system also still requires a third-party software solution to inspect the organization’s data stores and decide which of them should move to the object store.
The True Hybrid Cloud Solution
When designing a hybrid cloud storage solution, the solution is to always think of metadata management first. Managing metadata is the key to success in many storage infrastructure designs, and a hybrid cloud storage infrastructure is no different. IT planners need to look for a hybrid cloud storage solution that manages metadata as its first tenant so that users don’t have the overhead of worrying about where to store data. This hybrid cloud storage solution is partly a metadata repository that routes users to the appropriate storage location as they need it, similar to a DNS lookup. The metadata-first approach eliminates latency caused by metadata lookups, and users experience performance that’s typical of on-premises storage no matter the destination of the file.
We’ll detail this metadata-first approach in our third blog “Fixing Hybrid Cloud Storage.” In the meantime, sign up for our on demand webinar that details the challenges of creating a hybrid cloud storage strategy and provides practical solutions for overcoming them.
By attending this webinar, you’ll gain:
- A clear understanding of why data gravity and the lack of metadata management breaks hybrid cloud
- Why most cloud solutions do not provide predictable performance in a hybrid cloud due to their lack of metadata awareness.
- How to finally fix the latency and operational problems and create the ideal hybrid cloud infrastructure
Join Storage Switzerland and InfiniteIO for our on demand webinar, “The Hybrid Cloud Data Gravity Problem and How to Fix It”.
All registrants to the live webinar also receive a free copy of Storage Switzerland’s latest eBook “Why Data Gravity is Breaking Hybrid Cloud Storage”. This ten-page eBook explains the Hybrid Cloud metadata problem and provides advice on how to fix it.