Public Cloud Storage solves many of the challenges facing IT planners as they try to address the rampant growth of data within their data centers. Storing this ever growing quantity of data is causing problems for both primary and secondary storage. Public Cloud Storage solves many of the challenges that organizations face when dealing with the data deluge. Public Cloud Storage scales infinitely, so IT does not need to worry about upgrades or refreshes, and the organization can purchase its capacity on-demand.
The problem with Public Cloud Storage is its distance from the data center. Bandwidth, which has increased significantly over the last decade, is no longer the primary roadblock to the more aggressive use of Public Cloud Storage as a backend to on-premises primary storage. The primary challenge is latency; the time it takes for the Public Cloud to respond to the initial I/O request is significant. Metadata compounds the latency challenge because most I/O requests are metadata, data about data, which are very small in terms of capacity and are sensitive to latency.
Even the simplest of file systems provide key elements like location, dates created, dates modified, dates accessed and file types. Applications and users poll file systems continually looking for this specific metadata information. If the organization decides to store data in the cloud, each I/O operation needs to wait for the cloud to respond to the request.
The problems with high latency, lack of metadata management and lack of methods to intelligently move older data to the cloud severely limit the hybrid cloud uses cases. Most organizations today use public cloud storage to store backup data not primary or recently active data. The problem in using the cloud with backup is it can get expensive. If the backup solution only leverages the cloud to store a DR copy of the backup, then the organization is effectively paying for data storage twice. Some solutions enable an organization to tier older backups to the cloud, but these capabilities only scratch the surface of public cloud storage’s potential.
The Cloud Data Management Problem
A better use case for the cloud is to leverage the cloud as a storage component in a holistic data management strategy, where data continues to reside on the organization’s current production servers but then as it ages, that older data is migrated to an appropriate storage tier with a public cloud storage provider. A Cloud Data Management solution reduces capacity requirements on both primary storage and secondary storage.
The problem with most cloud data management solutions is dealing with latency during metadata operations. Many solutions either move the old data or expect the users or IT administrators to find it manually when it is needed. Other solutions create a transparent link to the data so that the user still “sees” the data. Both methods still suffer from a metadata challenge. Any query against that data, like a directory listing, forces the user to wait through the latency of cloud retrieval.
Conclusion
Hybrid Cloud Solutions need to take a metadata-first approach to design. They need to focus on metadata acceleration as a first step so that even on-premises metadata operations see a performance improvement. Once the metadata is inventoried and managed then moving data to the cloud and dealing with latency is much easier to resolve.
Our next entry, “The Hybrid Cloud Storage Workarounds,” covers how vendors are trying to address cloud storage latency problems and the shortcomings with those attempts. Storage Switzerland and InfiniteIO also hosted a webinar on the subject: “The Hybrid Cloud Data Gravity Problem and How to Fix It”. All registrants to the webinar also receive a free copy of Storage Switzerland’s eBook “Why Data Gravity is Breaking Hybrid Cloud Storage”.